Skip to main content

Search

Image to Video Prompting Guide


Introduction

Image to Video models transform images into videos with a text prompt. When using this generative mode, your image defines composition, subject matter, lighting, and style that guide the video. 

Your prompt's role is to describe what should happen — the motion, camera work, and temporal progression you want to see using clear, direct language.

Runway_The_camera_executes_an_aggressive,_012026.gif
Prompt: The camera executes an aggressive, sweeping horizontal arc around the subject, followed by an extremely rapid, aggressive crash zoom that concludes with a sharp focus on the subject's eyes.

This guide builds on knowledge outlined in our Introduction to Prompting guide by introducing concepts specific to Image to Video, and is currently optimized for the newest Gen-4.5 model. 

After completing this guide, you will understand how to create Image to Video prompts that produce videos matching your creative intent.

 

Related articles

 


Core prompt elements

Text prompt

Effective image to video prompts focus almost exclusively on motion. Rather than describing elements present in the image, use your prompt to describe the motion of the scene.

 

Motion components

  • Subject action
  • Environmental motion
  • Camera motion
  • Motion style & timing
  • Direction & speed

 

To control individual elements from your image, refer to characters and objects with general language to isolate them and define motion.

 

Do I need to include every component in my prompt?

No, you do not. Omitting certain components grants the model creative freedom to produce your video. We recommend starting with a simple prompt that focuses on the most critical motion components and then adding more detail to refine as-needed.

This approach to iterating helps you understand how additions and changes may affect your results.

Are there situations where I should describe visual components?

Yes, there are cases where visual descriptions can be helpful:

  • Introducing an element not present in the image
  • Dramatic changes from the starting image
  • Specifying transformation details
  • Specifying interactions between two (or more) elements

 

Image prompt

Your input image acts as the first frame and provides the model with the composition, subject matter, lighting, and style information for the video. 

For best results, ensure that the input image is high quality and free of visual artifacts. Visual artifacts, such as blurry hands or faces, may be intensified once your image is transformed into a video.

 


Prompt structure & organization

You don’t need to follow a strict formula to generate great results. Structure and order are far less important than clearly conveying an idea and reducing ambiguity.

However, establishing an organization method can assist with effectively conveying ideas and make future iteration easier. We recommend trying this structure if you’re new to generative media:

The camera [motion description] as the subject [action]. [Additional descriptions]

 

Click to view different examples of prompts following a similar structure
Text prompt Image prompt Result
The camera slowly pushes in as the person scales the giant soda. 
30fbe3c4-a55d-4b09-b5ce-3fee4cbc9a48.png
Gen-4_5 the person scales the giant soda 2512124562.gif
Handheld camera: The man stands still as the crowd moves around him. He starts yelling as the camera slowly zooms out. Natural camera shake.
man.jpg
Adobe Express - Gen-4_5 hand held camera the man stands still as the crowd moves around her, he starting yelling as the camera slowly zooms outHandheld documentary film style Natural camera shake Raw indie aestheti.gif
Whip pan to painting of a fox. Whip pan back to the woman with a curious expression. Whip pan back to the fox painting, the fox is moving.
d3624c8c-fa5a-4c0f-88e9-d8f1061fc20c.png
Gen-4_5 1 whip pan to painting of a fox2 whip pan back to the woman with a curious expression3 whip pan back to the fox painting, the fox is moving 3855905079 (1).gif

For more prompt examples and their outputs, please see our Camera Terms, Prompts, & Examples.

 


Advanced techniques

Sequential prompting

Sequential prompting provides an order of events for temporal control. This can be done through natural language, or by providing rough timestamps for an action to occur:

  • Natural language: X occurs, then Y occurs. Finally, Z occurs.
  • Timestamps: [00:01] X occurs. [00:03] Y occurs. [00:04] Z occurs.

For best results, consider if the requested sequences make sense with the selected duration. You may opt for a higher durations for more complex sequences.

 

Creating longer sequences

Create longer sequences by extracting the last frame of a completed generation and using that as the image input for a new video.

To extract the last frame:

  • Move the playback scrubber to the very end of the completed video
  • Select Use from beneath the video
  • Select Use current frame

This will load in the selected frame into the current model. Once the generation completes, you can combine both clips in a video editor to adjust timing and remove the shared frame.

 


FAQ

Why am I having challenges receiving the desired motion with a certain image?

Input images can contain implied motion through elements like motion blur, mid-action elements and poses, or directional lines. Prompting for motion that contradicts these visual cues may require more iteration to achieve your desired result.

If you're not getting the motion you want after several iterations, check your input image for implied motion cues and consider using Text/Image to Image to remove or minimize cues before generating.

  Input image Prompt Result
Prevalent motion cues: motion blur, dust clouds
add_motion_blur_and_dust_clouds_behind_the_back_wheels__keep_the_composition_the_same_2.png
The car is parked and completely motionless. The camera performs an aggressive, sweeping horizontal arc around the parked car.
Gen-4_5 The car is parked and completely motionless The camera performs an aggressive, sweeping horizontal arc around the parked car 324793778.gif
Minimized motion cues
orange_truck_on_white_sand_dune__professional_photography__stylized__staged__car_photography_3.png
Gen-4_5 The car is completely motionless The camera performs an aggressive, sweeping horizontal arc around the parked car 2166293514.gif

In the above example, prompting for a motionless, parked car was contradictory to the prominent dust clouds and motion blur that act as motion cues. Removing the dust clouds and motion blur from the image provided the desired results with the same prompt.

Why did I receive an unwanted cut in my video?

Receiving unwanted cuts in your video may indicate that your image and prompt combination would benefit from a higher duration.

First, try increasing the duration to iterate for a seamless shot. If cuts continue to occur, check your prompt for phrasing that might indicate a cut and consider adding a prompt component like Continuous, seamless shot to your input.

How do I minimize camera motion for my shot?

Video models are designed to produce motion, so ensuring that you describe what motion should occur within the frame is important to receiving shots with less motion.

However, this alone may not result in a perfectly still shot. You can try adding prompt elements like the examples below to further reinforce minimal motion:

  • The locked-off camera remains perfectly still.
  • The camera must start and end on the exact same frame to create a perfect loop.
  • Minimal subject motion only.

Using these methods to reduce camera motion and then stablizing the shot in a video editor can help achieve the desired effect. Alternatively, consider using the Animate Frames app using the same image for both inputs for even more control.