Instagen by Jared Lodwick

Generative AI Instant Camera

INSTAGEN

Published

– JAN 2024

CAPTURE MORE THAN A MOMENT

Picture this: It's a sunny Saturday afternoon in the park, the perfect setting for lounging with friends and capturing memories. Just as you're about to snap a photo with your iPhone, your friend suggests something better. From their bag, they pull out an intriguing camera – a fusion of retro Polaroid charm and sleek modern hardware.

With a flash, a unique photo is produced. Out of the film slot emerges is like a watercolor painting, where reality is softened into dreamy brushstrokes. The park and friend you know is still there, but it's different—serene and slightly whimsical, as if it's been lifted from the pages of a storybook and rendered with a watercolor artist's gentle touch.

INPUT IMAGE

OUTPUT IMAGE

Welcome to the world of Instagen. This handheld, point-and-shoot camera doesn't just capture moments; it reinvents them with cutting-edge AI, creating beautiful art pieces printed on real photo paper – giving you tangible take-home memories.

I designed and built Instagen over the course of 2023, navigating through what turned out to be my most challenging and rewarding personal project to date. It pushed the limits of my knowledge and comfort zone, driving me to learn Raspberry Pi development, software architecture, complex programming in new languages, and even CAD design for a custom camera body, complete with 3D printing.

In this writeup I'll take you deeper into the journey, from the initial spark of the idea, to the intricate principles that guided its design, and the final steps of its development process.

But before diving in, why not take a look at the Generative Gallery to see some of the shots that I've taken with the Instagen? If you don't want to, these four images below are the latest images I've taken.

VIEW THE GALLERY

THE SPARK

In April 2023, a seemingly insignificant Reddit post caught my eye and changed everything. It was a Polaroid commercial that aired in 1975, but not just any commercial - it starred a young Morgan Freeman demonstrating the magic of capturing and printing moments instantly. Sure, Morgan Freeman was cool, but it was the Polaroid camera that struck a chord. It reminded me of my first Polaroid experience as a kid – the anticipation and magic of watching a moment materialize right before my eyes. It was a slice of time instantly captured and placed in my hands.

WHY BUILD AN AI CAMERA?

Today's cameras, while technologically superior, miss that tactile, nostalgic essence. Millennials like me briefly knew the era of film before the digital wave took over, leaving us with a subtle longing for something more tangible. Instagram and the rise of 'hipsterism' partly filled this void and breathed new life into the art of photography.

Gen Z's relationship with technology is complex. They are digital natives, yet they often find themselves drawn to more tangible, 'retro' technologies. This is evidenced by the revival of film photography. Kodak's film sales, for instance, have seen an uptick of roughly 5% year over year recently, indicating a renewed interest in the medium. Retailers like Urban Outfitters are capitalizing on this trend, selling refurbished Polaroid cameras at premium prices. Disposable cameras become a hot commodity every summer, particularly among high schoolers.

Parallel to this retro resurgence is the advancement in AI image generation. Tools like Dall-E, Midjourney, Stable Diffusion, and others represent a frontier in creative technology, turning words into images and reimagining existing pictures in novel styles.

Caught between these two worlds, I saw an opportunity. What if we could merge the physical, nostalgic appeal of film with the creative potential of AI? A camera that not only captures moments but transforms them through AI, then prints them out – a blend of old and new, tangible and digital. Had anyone done that before?

How hard could it be?

OH SHIT, I DON'T KNOW HOW TO CODE

In case you didn't notice, I'm more of a designer-designer. Outside of fiddling with web templates, I have no idea how program or even write JavaScript. It's always been a bit of a mystery which I was happy to leave to people way smarter than me. But I also hate it when I can't do something that I want to do. Not in a "'"roll and scream on the floor because I can't get my way"'" kind of way, but in an "if I have an idea, I am going to find a way to make it work lest I go mad" kind of way.

When I designed Phoney I simply hired a developer to build the app and voice actors to record the voices. That idea was briefly on the table, but considering that this was purely a programming and hardware project, I would have been removing any need for myself in the project other than managing it - and I didn't start this project to LARP as a project manager.

So I had to either learn how to code (I can hear the eyes of engineers rolling rading that sentence) or figure something else out.

Fortunately the world just recently unlocked a new cheat code... ChatGPT!

I want to design and build a modded shell of a Polaroid camera with custom electronic components. The camera will capture a photo, uploads the photo to the OpenAI DALL-E API endpoint, process variations of the captured image, download the processed image, and print it onto Polaroid film. This modified Polaroid camera will use a digital camera module connected to a Raspberry Pi and a Polaroid film printer.
I want you to prepare a development plan with all of the necessary steps to make the software and hardware for this camera, including which APIs and services I should use, and which scripts I should write.

Opening request to ChatGPT

As it turns out, ChatGPT is great at writing code. Sure, this may be common knowledge now, but at the time, this was cutting edge stuff! And I won't lie, it was really easy. So easy that I'm struggling with what to write here. There's no secret to it. It was very similar to working with an engineer at work, but with our interaction only being through text and me serving as the implementer and tester of code.

Not all of it was written by the bot. The printer, which is a critical piece of the puzzle, was implemented by the talented Mike Manh - a real developer. Mike joined me over the summer as a technical consultant and development partner, and played a role in helping get my prototype off the wall and into a handheld format (more on that later).

There is a stigma that ChatGPT will save you hours by writing you a complex script in a few seconds, but then you spend hours debugging its code. This is not my experience. So, rather than bore you with the story of how "I coded" an entire camera software using ChatGPT, I'll give you some of the high-level details of how I use it to successfully code for me with hardly any issues:

1. Tell your Chat what exactly you need from it. If you don't have any coding skills, tell it. Ask for detailed responses and full blocks of code. Asking the Chat to be detailed in its responses and to think things through step by step, in my experience, reduces the risk of errors.

2. Lay out as much of your project requirements as you can up-front. If there are mistakes in your plan, like which Raspbian OS you should be using, Chat will likely find the error and call it out. The more detailed you can be early on, the better.

3. It WILL make mistakes. This is part of the fun! If your script throws an error, paste the code in. Chat will apologize in its generic format, then provide a revised block of code. If your issue doesn't have an error but is more conditional, be descriptive of the error and explain the conditions surrounding it. Chat is great at drilling down into potential causes of issues. And don't be afraid to correct Chat if it is flat-out wrong. Hallucinations are a part of nature.

4. Skim the code blocks it gives you. If you are a super amateur like me and give the Chat huge blocks of code to find the one line that's causing a bug, Chat will often 'comment out' portions in the middle of your code with its response. It's easy to miss this if you are only looking at the opening and closing blocks in its response.

5. Praise the Chat. Say thank you and let it know when it does a good job. This has no impact on the outcome of your project but it's important to maintain our humanity and who knows if these things will become our overlords some day. You want to have a bot on your side to vouch for you.

THE FIRST PROTOTYPE

Thanks to manufacturing shortages and global supply chain issues, Raspberry Pis were nearly impossible to get at the time. I happened to have a Pi 4 inside of a smart mirror that I recently built. The mirror has a camera inside of it which I use for facial recognition, so it made for a perfect stand-in until I was able to get better hardware later.

I was able to get the first end-to-end capture and generation flow up and running within a few days using Dall-E using the image variation API endpoint. OpenAI only allows access to Dall-E Gen 1 through their API (it may be Gen 2 now) and and doesn't allow any text prompts to guide the image variation output, so the quality of the images left something to be desired... but the proof-of-concept worked, and I knew I was able to start taking the project more seriously.

The software is only half of the story, as of course the camera needs a body. The seemingly easiest thing to do was to take a real Polaroid, gut it, and cram all of the new digital components inside... which for the prototype is exactly what I did.

I purchased a 1970s Polaroid OneStep off of Ebay, found some YouTube repair videos showing how to disassemble the camera without breaking it, and got to work taking it apart. Once I was left with an empty shell, I took a good look at what we had and then realized that it wasn't going to work at all.

The printer that we were using loads film cartridges from the bottom. The Polaroid, however, loads film from the front through a flap that folds down. We thought that we might be able to just slide the printer in out of the flap, but it was too tall. There was also the issue of mounting the Raspberry Pi and camera module, which wasn't as easy as simply fixing to the insides. These cameras are small and offer little room to work with, so the components simply wouldn't fit. It was clear that we had to be more tactical with our approach and create some custom components. We'd have to 3D print

*Queue three months of designing and printing mounting components montage*

PROOF OF CONCEPT

I'm going to skip over a lot of the trail and error of learning to design and 3D print components to mount our components into the original OneStep body and just give you the juicy details, but you can see the "V1" body we built in this video. In the end, we only kept the original face of the OneStep and created a custom base plate to hold all of our goodies. This let us mount the printer, Pi, and camera, as well as install a shutter button and LED status light. On the bottom of the base plate was a hinged door which let us load in film. This door was held closed by a screw, so you needed to carry a screwdriver with you if you were out shooting.

This was a sub-par solution which is a little embarrassing to look at - but it worked. While it doesn't look very good, it served as a great learning experience for designing and printing custom hardware, and set the stage for me to be able to design the fully cusom body for the final build.

HOW THE SAUCE IS MADE

The first thing the software does is look at the contents of the photo to understand what sort of style prompt it should use to reprocess the image. It identifies that there are people sitting outside looking at the camera, so it filters through prompts that best fit 'portraits' out of hundreds of prompts across dozens of different art styles. In this case, it chose the prompt "portrait in the style of Peter Paul Rubens, with rich colors, strong chiaroscuro, and a focus on capturing the power and vitality of the subject".

Once the image is generated, the camera starts to gently rumble as a the on-bard printer pushes a sheet of film out of the slot. Within minutes, the processed AI image fades into view. At the same time, both the original and newly generated image were uploaded to your Instagen profile for all to see.

When you look at the original photo you took you realize that your friends were slightly out of frame and a bit overexposed due to the bright sunlight, but it doesn't matter! One of the amazing benefits of AI generated images is that errors in photos like framing, exposure, and focus issues are negated by the recreation of the photo. They always come out looking better than when they went in.

V2 BODY

Now, the real fun begins. After using the V1 camera out in the wild for a month and getting a good feel for what worked and what didn't, I was able to go back to the drawing board and start planning a purpose-built body from scratch. I wanted it to be simple, beautiful, and modular so that it was easy to print and construct. I also wanted it to look as professional as possible, which meant getting all of the extra details like a viewfinder and strap. Fortunately, I was able to recycle these components from the original OneStep and fit them into my build.

Designing the camera body from the ground up wasn't entirely simple. We all know the basic elements that make up a camera body, but once you sit down and start to design your own in 3D, the aesthetics and functional elements become a lot less clear. Add in the need to design components that support other hardware, and things start to get get really tricky. But with enough time and compostable filament, I was able to design a working chassis that held the printer and Pi components into a small form factor.

With the internals figured out, I could move to refining the final envelope to enclose it all. I'm certainly simplifying the process a bit, as there are a lot of components to account for, but all you need to know is that it took about 500 hours in total (conservative estimate) and the whole process was a lot of fun.

IT ALL COMES TOGETHER

LET'S TALK HARDWARE

POWERED BY PI

The brains behind the operation is a Raspberry Pi Zero W 2. This little computer which is smaller than a credit card packs 512MB SDRAM with Wi-Fi and Bluetooth capabilities. Since all image processing is handled off-device, it's the perfect balance of size and speed.

Powering the Pi is a nifty little power management hat manufactured by PiSugar. This board allows the camera to run on battery power and hosts the on/off power switch along with a sleep/wake button, making it easily portable and super convenient to power the camera on and off.

PHOTO PRINTER

The pièce de résistance of the build and by far the most challenging part to design for. Nothing good comes easy! Not only did this require a special chassis to mount onto, which was full of weird little details from the manufacturer, it also required us to build a hinged door to load the film cartidges through. This door also contains spring-loaded arms to apply strong pressure to the film cartridge as well as a locking mechanism which intfaces with special 'open-close' sensor attached to the printer.

The software to incorporate this printer into the system was developed by a developer in the Netherlands and was implemented by Mike Manh, who I mentioned earlier. Mike developed a beautiful solution to manage the complex Bluetooth handshake that takes place between the Pi and printer on startup, and worked to improve the packet transfer speed between the two devices to reduce print time. Without these two, Instagen would be mothing more than a what should have been a mobile app made more complicaded and shoved in a plastic box.

MAGNETS EVERYWHERE

Magnets are cool as hell, so I was happy to find ways to usefully integrate them into Instagen's design. The push button I used for the shutter waws a bit prone to sticking, so some opposing magnets were embeded into the button and the component mount behind it to force the button out after pressing.

The door latch also uses a similar setup to ensure the latch doesn't get stuck in the open position, as well as applying additional force on the printer's door sensor to ensure constant closure. Additionally, the faceplate is held on by magnets in each corner, not only allowing for easy access to the internal components (for me) but allows for swappable faces with different designs.

LENS

The lens is composed of multiple parts which house the camera module, lens glass, and other components for style. Sandwitched and locked together, the entire lens piece locks into the faceplate, also alowing for easily swappable designs.

FLASH/STATUS LIGHT

What kind of camera doesn't have a flash? Instagen has a built-in flash that actually (kind of surprisingly?) works, thanks to a carefully crafted step-layered chrome-plated housing which redirects light rays forward.

Since the generation process takes a moment, and there is no screen on the camera, this light doubles as a status indicator - cycling through a rainbow spectrum as the camera works its magic. Once the light goes off and the printed photo is out, the camera is ready for another capture.

WHAT'S NEXT...

Instagen is a living project that will continue to evolve over time. There is so much that can be done in this space and I have a lot of ideas. The only limiting factor is my time.

My immediate plans are to start on V2, which will replace the viewfinder with an LCD screen, introduce physical buttons and dials to offer more control over the output, and possibly introduce a new camera module with swappable lenses. Video is another exciting avenue that I am keen on exploring soon, though I'm not sure if that functionality makes sense for the Instagen or another camera model. Keep your eye on this space!

Thank you for reading.

SPARK AR EFFECTS