June 03, 2026

Image-to-Video vs Text-to-Video: Why a Start Frame Wins Control

26 minutes

There are two front doors into every video model, and most beginners pick the wrong one. Why handing the model a still you already approved beats rolling the dice on pure text, how the prompt changes when the image carries the scene, and when text-to-video is still the right call.

Episode page & show notes

Try a walking desk - stay healthy & sharp while you learn & code

Episode three of the single-shot ladder. You can already type a prompt and get a clip; today we change which door you walk through to get it.

Tutorial. Text-to-video (T2V) invents the picture and the motion from words at once; image-to-video (I2V) animates a still you hand it, so the model only solves motion. We make the case that for finished, consistent, on-deadline work, I2V usually wins, on three stacking fronts: control (you lock composition, character, framing, lighting, and brand before spending video credits), consistency (a start frame anchors identity and kills the mid-clip identity drift that plagues T2V), and economics (drafting in cheap image generations and saving video credits for the final motion pass).

The big behavior change: in I2V the image carries subject, composition, and light, so your prompt should describe motion and camera only, not re-describe the scene. We cover the over-prompting mistake that makes models fight their own image, per Runway's I2V prompting guide, plus why negative phrasing ("no shake") fails and what to write instead.

Also inside: the four start-frame sources (mint it in an image model, shoot a photo, grab a frame, screenshot a mock); first/last-frame conditioning for loops, reveals, and transitions; a copyable mint-load-motion-generate-chain-assemble workflow; what the real knobs look like across current tools (durations, resolution, motion brushes, multi-reference); the five failure modes you'll actually hit (identity drift, over-prompting, first-frame drift, warping on big motion, style drift); and when T2V is still the right tool.

Callbacks to episode 1 (read the Artificial Analysis Video Arena, and note the T2V and I2V boards are different rankings) and episode 2 (prompt anatomy). Forward to character consistency, keyframe chaining, and minting start frames.

AI-generated podcast by OCDevel. Models, limits, and prices move monthly; bench the live leaderboard on your own shot before trusting any ranking.

...more