Introduction

Image to Video models transform images into videos with a text prompt. When using this generative mode, your image defines composition, subject matter, lighting, and style that guide the video.

Your prompt's role is to describe what should happen — the motion, camera work, and temporal progression you want to see using clear, direct language.

Runway_The_camera_executes_an_aggressive,_012026.gif

This guide builds on knowledge outlined in our Introduction to Prompting guide by introducing concepts specific to Image to Video, and is currently optimized for the newest Gen-4.5 model.

After completing this guide, you will understand how to create Image to Video prompts that produce videos matching your creative intent.

Core prompt elements

Text prompt

Effective image to video prompts focus almost exclusively on motion. Rather than describing elements present in the image, use your prompt to describe the motion of the scene.

Motion components

Subject action
Environmental motion
Camera motion
Motion style & timing
Direction & speed

To control individual elements from your image, refer to characters and objects with general language to isolate them and define motion.

Do I need to include every component in my prompt?

No, you do not. Omitting certain components grants the model creative freedom to produce your video. We recommend starting with a simple prompt that focuses on the most critical motion components and then adding more detail to refine as-needed.

This approach to iterating helps you understand how additions and changes may affect your results.

Are there situations where I should describe visual components?

Yes, there are cases where visual descriptions can be helpful:

Introducing an element not present in the image
Dramatic changes from the starting image
Specifying transformation details
Specifying interactions between two (or more) elements

Image prompt

Your input image acts as the first frame and provides the model with the composition, subject matter, lighting, and style information for the video.

For best results, ensure that the input image is high quality and free of visual artifacts. Visual artifacts, such as blurry hands or faces, may be intensified once your image is transformed into a video.

Prompt structure & organization

You don’t need to follow a strict formula to generate great results. Structure and order are far less important than clearly conveying an idea and reducing ambiguity.

However, establishing an organization method can assist with effectively conveying ideas and make future iteration easier. We recommend trying this structure if you’re new to generative media:

The camera [motion description] as the subject [action]. [Additional descriptions]

Click to view different examples of prompts following a similar structure

Gen-4_5 the person scales the giant soda 2512124562.gif

For more prompt examples and their outputs, please see our Camera Terms, Prompts, & Examples.

Advanced techniques

Sequential prompting

Sequential prompting provides an order of events for temporal control. This can be done through natural language, or by providing rough timestamps for an action to occur:

Natural language: X occurs, then Y occurs. Finally, Z occurs.
Timestamps: [00:01] X occurs. [00:03] Y occurs. [00:04] Z occurs.

For best results, consider if the requested sequences make sense with the selected duration. You may opt for a higher durations for more complex sequences.

Creating longer sequences

Create longer sequences by extracting the last frame of a completed generation and using that as the image input for a new video.

To extract the last frame:

Move the playback scrubber to the very end of the completed video
Select Use from beneath the video
Select Use current frame

This will load in the selected frame into the current model. Once the generation completes, you can combine both clips in a video editor to adjust timing and remove the shared frame.

FAQ

Why am I having challenges receiving the desired motion with a certain image?

Input images can contain implied motion through elements like motion blur, mid-action elements and poses, or directional lines. Prompting for motion that contradicts these visual cues may require more iteration to achieve your desired result.

If you're not getting the motion you want after several iterations, check your input image for implied motion cues and consider using Text/Image to Image to remove or minimize cues before generating.

add_motion_blur_and_dust_clouds_behind_the_back_wheels__keep_the_composition_the_same_2.png

Gen-4_5 The car is parked and completely motionless The camera performs an aggressive, sweeping horizontal arc around the parked car 324793778.gif

In the above example, prompting for a motionless, parked car was contradictory to the prominent dust clouds and motion blur that act as motion cues. Removing the dust clouds and motion blur from the image provided the desired results with the same prompt.

Why did I receive an unwanted cut in my video?

Receiving unwanted cuts in your video may indicate that your image and prompt combination would benefit from a higher duration.

First, try increasing the duration to iterate for a seamless shot. If cuts continue to occur, check your prompt for phrasing that might indicate a cut and consider adding a prompt component like Continuous, seamless shot to your input.

How do I minimize camera motion for my shot?

Video models are designed to produce motion, so ensuring that you describe what motion should occur within the frame is important to receiving shots with less motion.

However, this alone may not result in a perfectly still shot. You can try adding prompt elements like the examples below to further reinforce minimal motion:

The locked-off camera remains perfectly still.
The camera must start and end on the exact same frame to create a perfect loop.
Minimal subject motion only.

Using these methods to reduce camera motion and then stablizing the shot in a video editor can help achieve the desired effect. Alternatively, consider using the Animate Frames app using the same image for both inputs for even more control.

Text prompt	Image prompt	Result
The camera slowly pushes in as the person scales the giant soda.
Handheld camera: The man stands still as the crowd moves around him. He starts yelling as the camera slowly zooms out. Natural camera shake.
Whip pan to painting of a fox. Whip pan back to the woman with a curious expression. Whip pan back to the fox painting, the fox is moving.

	Input image	Prompt	Result
Prevalent motion cues: motion blur, dust clouds		The car is parked and completely motionless. The camera performs an aggressive, sweeping horizontal arc around the parked car.
Minimized motion cues

Image to Video Prompting Guide