Introduction

Gen-4 enables fast, controllable, and flexible video generation that can seamlessly sit beside live action, animated and VFX content. Gen-4 creates videos in 5 or 10 second durations based on an input image and text prompt you provide.

This article covers different example structures, keywords, and prompting tips to help you get started with Gen-4. For information about Gen-4 pricing, output details, and using the UI, please see the Creating with Gen-4 documentation.

Article highlights

Don't underestimate the power of simplicity in your text prompt
Use a high-quality input image, free of visual artifacts, for best results
Use the text prompt to focus on describing motion
Use positive phrasing and avoid negative prompts
Refer to subjects in general terms, like "the subject"

Prompting Basics

This section covers our recommended approach to prompting, but experimenting with prompt variations and patterns will allow you to discover what works best for your inputs and desired outcome.

Prompting for Iteration

The Gen-4 model thrives on prompt simplicity. Rather than starting with an overly complex prompt, we recommend beginning your session with a simple prompt, and iterating by adding more details as needed.

Begin with a foundational prompt that captures only the most essential motion to the scene. Once your basic motion works well, try adding different prompt elements to further refine the output:

Subject motion
Camera motion
Scene motion
Style descriptors

Adding one new element at a time will help you identify which additions improve your video, understand how different elements interact, and more effectively troubleshoot unexpected results.

Below is an example prompt that conveys all ingredients:

Prompt	Input image	Output
a handheld camera tracks the mechanical bull as it runs across the desert. the movement disturbs dust that trails behind the mechanical creature. cinematic live-action.

See the Prompt Elements section for more example prompts and their respective outputs.

Best Practices

While there's no right or wrong way to write a prompt, following these best practices will help you achieve the results you envision. Click each recommendation for more context and examples:

Use positive phrasing only

Gen-4 is designed to interpret prompts that describe what should happen in your video, not what should be avoided. Negative phrasing is not supported and may produce unpredictable or even opposite results.

❌ No camera movement. The camera doesn't move. NO MOVEMENT

✅ Locked camera. The camera remains still.

Use direct, simple, and easily understood prompts

Avoid using overly conceptual language and phrasing when a simplistic description would efficiently convey the scene. Using prompts that describe the idea or feeling behind a motion, rather than the specific physical movements, may lead to unexpected results.

Abstract concepts force the model to interpret your intention, often resulting in random or unexpected movements. Always translate conceptual ideas into clear, specific physical actions the model can understand. This direct approach eliminates ambiguity and provides the model with concrete instructions it can reliably execute.

❌ The subject embodies the essence of joyful greeting, manifesting an acknowledgment of presence in a welcoming manner that conveys inner happiness.

✅ The woman smiles and waves.

Focus on describing the motion, rather than the input image

Both text and image inputs are considered part of your prompt. Reiterating elements that exist within the image in high detail can lead to reduced motion or unexpected results in the output.

❌ The tall man with black hair wearing a blue business suit and red tie reaches out his hand for a handshake

✅ The man extends his arm to shake hands, then nods politely.

Avoid conversational or command-based prompts

While external LLMs thrive on natural conversation, Runway's models are designed to thrive on visual detail. Conversational elements like greetings or explanations waste valuable prompt space.

Similarly, command-based prompts that request changes often lack the descriptions needed to convey how an element should behave in the output. For example, rather than directly asking to add or remove elements, instead describe how the elements should appear or disappear from the scene.

❌ can you please add my dog to the image?

✅ A dog excitedly runs into the scene from off-camera

Avoid overly complex prompts

Gen-4 generates videos in 5 and 10 second clips, so it can be helpful to consider each generation as a single scene.

Attempting to dictate each second of the video with multiple scene changes, subject actions, or style shifts may provide unintended results as the model attempts to reconcile too many disparate elements or contradictory instructions.

In most cases, a simple description of the desired motion for a single scene will work well and allow the model to shine:

❌ a cat transforms into a dragon while jumping through a forest that changes
  seasons with each leap. The camera spins 360 degrees and zooms underwater where
  the dragon becomes a submarine in a neon cityscape.

✅ a cat transforms into a dragon while running through a forest.

Image prompts

In Generative Video models, the text prompt plays a crucial role in guiding the generative process in tandem with your image prompt, or input image. The qualities of your input will play a key role in the final output.

The input image establishes the visual starting point of the entire generative process by conveying key visual information about subjects, composition, colors, lighting, and style— allowing you to focus on describing the desired motion.

Prompt Elements

Subject Motion

Subject motion describes how characters or objects should behave or move. Subject motion may include physical movement, expressions, gestures, and more.

When describing subject motion, refer to characters or objects with general terms like "the subject" or simple pronouns. For example: "The subject turns slowly" or "She raises her hand." This helps the model focus on creating smooth motion rather than reinterpreting subject details already present in your image.

For Multiple Subjects

When your image contains multiple subjects needing different movements:

Use clear positional language: "The subject on the left walks forward. The subject on the right remains still."
Or simple descriptive identifiers: "The woman nods. The man waves."

This approach allows you to direct specific motion for each subject without confusing the model about which element should perform which action.

Scene Motion

Scene motion describes how the environment of a video should behave or react to motion. Scene motion may be based on subject motion or occur independently.

There are two different approaches for prompting for scene motion:

Insinuated motion: "The subject runs across the dusty desert"
Described motion: "The subject runs across the desert. Dust trails behind them as they move"

Insinuating motion with adjectives can lead to more natural results, while directly describing the motion can lead to emphasis of the element. If insinuated scene motion doesn't provide the desired results, try insinuating motion multiple times or adding simple description to further emphasize the movement.

Camera Motion

Camera motion describes how the camera should move through the scene in your input image. Camera motion can be prompted for movement style (locked, handheld, dolly, pan, and more), tracking subjects or moving independently through environments, shifts in focus, and more.

For examples of filmic motion terms, our Creating with Camera Control documentation is a good starting point. This Gen-3 article lists different terminology you can explore adding to your creations.

Style Descriptors

Style descriptors indicate broad or general motion elements. In example, you might use a style descriptor to convey motion speed, general movement style (live action, smooth animation, stop motion), or aesthetic style.

Style descriptors can be appended to prompts while refining results or included within the main body of the prompt.

Examples

Prompt	Input image	Output
the woman inspects her reflection in the mirror. the surface of the mirror bubbles with large, organically-shaped translucent bubbles in varying sizes. locked camera
the pile of rocks transforms into a humanoid made out of rugged volcanic rocks. the rock humanoid walks around the scene.
the handheld camera tracks the mouse as it scurries away.
The Brooklyn bridge gets on fire and collapses

Gen-4 Video Prompting Guide