How We Made Our Animated AI Short and the Nuances of Using AI Video Generation


In 2012, we made Lenny’s big delivery, a simple video with a cute storyline and no dialogue about a dog on a mission to deliver a VHS tape:

In an effort to test the limits of the latest AI video generation tools, we asked Billy Woodward, a super-talented motion artist and creative technologist, to bring Lenny’s story back to life. He’s a twenty-year industry veteran, having spent years pushing the limits of AI tools to achieve results that feel cinematic.

Using the video from 2012 as our blueprint, we reimagined it as a brand-new AI-generated animated video. Across our social media channels, we presented it as a million-dollar Pixar short made with AI in a week. Some of you loved it, but let’s be honest, a lot of you hated it:

In this post, we’ll guide you through our process for creating the video and touch on the nuances we’ve discovered in using AI for video generation. Let’s dive in!

AI for animation

Can AI really be the game changer that we think it is, putting creative horsepower into everyone’s hands? The latest Sora and Veo models are getting close to crossing the uncanny valley, but they haven’t cracked humans yet. People just know when something isn’t quite right. But animation doesn’t really have an uncanny valley. We already know it’s not real, so our brain stops scanning for flaws and starts focusing on the story.

For our animated short, we aimed to test if we could make a Pixar-quality animated short quickly, affordably, and with emotional impact.

  • We broke the work into four buckets:
  • Character development
  • World-building
  • Life in motion
  • Human touches

Character development

Billy’s first step was character development for Lenny and other characters in Midjourney.

To prompt, he didn’t simply type in “Pixar red dog” and call it a day. He refined, experimented, and tweaked until the image in his head finally came to life. He used references and animation style tokens to refine the aesthetics we liked best.

“It can take dozens of generations, paired with plenty of pivots and prompt pushes, to get your character designs correct. This is especially true when basing it on real-world references. The more tokens you use, the more direction you provide, and the more you envision the final image, the better your results. Quick tip: If you find a generation that you like, but it’s still not quite right, toss it into your style reference within Midjourney. It will guide the generations further toward that style.”
Billy Woodward

Character consistency

One great frame doesn’t make a film. Character consistency has been one of the most significant challenges in AI video. How do you maintain the same amazing Lenny across dozens of shots with different angles, lighting, and environments?

Billy figured out a trick with some of the latest AI generation tools. Here’s where the real magic happens.

With our Lenny locked in, Billy hopped into Nano Banana and Seedream to generate a full 360-degree character sheet. Essentially, we envisioned what he looks like from every conceivable angle. For the first time, we can maintain a character’s appearance consistent across multiple generations in different settings with different camera moves.

This is what unlocks real storytelling that actually makes sense from shot to shot.

World-building

Once we finalized our main character, it was time to drop him into our animated world.

Generating places has actually been good for a while — it’s like real-life location scouting. Based on the rough idea we had in our head, we knew we needed Boston streets, a sausage shop, a park, a fountain, and our final office setting.

Using the same prompt sculpting approach, Billy assembled a bunch of location options. When he needed something super specific, he used Google Street View screenshots as reference images inside Midjourney.

Just like the character sheet, using the same reference-based workflow, we could get consistency across every shot. So we didn’t just have a storefront in Boston. We had the wide shot, the medium, and the close-up all matching in tone, structure, and lighting. That’s what made it feel more cinematic and real.

Life in motion

Now, let’s break down how we put our characters and world into motion. AI video models run on start and end frames. You’re basically given point A and point B, and letting the AI figure out what happens in between.

During this process, we put on our cinematographer hats and thought about the angles, lenses, movement, and the emotion we wanted to convey in the shot.

With tools like Nano Banana and Seedream, we dropped Lenny into any scene, picked a camera position, and directed the shot as if we were on set.

If we needed even more precision, we stitched together multiple image generations in Photoshop to nail framing and continuity.

To direct, we uploaded the start frame and described the motion — what was happening, how it felt, and how the camera should react. The more precise the prompt was, the less the model hallucinated.

For example, in one scene, Lenny drops the VHS tape to chase a tennis ball. Without giving the model enough direction, the VHS tape fell unnaturally and morphed into a vinyl record. To address this, we tweaked the prompt, added clear transitions, some real-world physics, and a touch of emotion.

Pacing, rhythm, and story flow

The last step was to add those final human touches. We brought all the video generations into Adobe Premiere to start cutting and consider pacing, rhythm, and story flow — practically the same way we would with any live action project.

We used Suno, an AI music generation tool, to find the perfect background track. The prompt we used was: clarinet, French, upbeat, quirky jazz, cute, family film about a dog.

We needed some additional sound effects, so we turned to ElevenLabs, a voice and sound generator. For example, we prompted ElevenLabs to give us the sound of a VHS hitting the ground and added it to our edit.

Lastly, a few passes in After Effects helped sell the effects and fix some of the artifacts from the generations.

The result? A multi-million dollar production made with AI in a week

We couldn’t believe how fast everything came together. Having hired animation agencies to create videos for us in the past, we’re familiar with the traditional cost of a project like this. 3D animation, dozens of scenes with multiple characters, lighting, compositing, world-building — you’re easily looking at a multimillion-dollar production. But we made ours in eight days, practically for free.

The public response

When we shared that we’d made a multi-million dollar Pixar-quality short film in a week with AI, the internet had opinions.

We garnered millions of views, thousands of reactions, and loads of comments. As we mentioned, some of you loved it, while many of you hated it. And there were probably many people who didn’t know how to feel. But that’s what’s so strange about the moment we’re in right now.

Our thoughts on AI video generation

Sharing an animated dog chasing a VHS tape should be simple — you either enjoy it or you don’t. But when AI enters the picture, the reactions get louder and more complicated. That’s part of what makes this moment in video creation so interesting. The tools are new, expectations are shifting, and everyone is trying to make sense of what creativity looks like when software becomes part of the storytelling process.

For us, the real takeaway isn’t that AI makes things better or worse — it’s that it changes where the creative work happens. Taste, judgment, timing, story, humor, and emotion are still creative decisions that are human. What AI changes is how ideas move from imagination to the screen and how many people might be able to participate in that process. Some projects will be best served by a camera, a crew, and a set. Others might be sparked or supported by new tools.

At Wistia, our goal is not to prove AI is the future or to convince anyone that it replaces anything. Our goal is to understand what these tools can do, where they fall short, and how they might help teams make videos they are proud of. We are learning in public, just like we did with DSLRs, iPhones, and every new format that shaped modern video. And as always, we aim to share what we learn so creators and marketers can decide what works best for their stories.

Mailing list sign-up form

Sign up for Wistia’s best & freshest content.

More of a social being? We’re also on Instagram and Twitter.