Introduction

This tutorial guides you through creating realistic multi-character dialogue scenes using Runway's generative tools, specifically Act-Two (generative motion capture), Gen-4 Image, and Gen-4 Video.

While Act-Two currently supports single-character inputs, this workflow enables you to create conversations with two or more characters in a single scene.

Prerequisites

Access to Runway's Gen-4 Image, Gen-4 Video, and Act-Two tools
A local video editor (minimal editing experience required)
Recording device for performance and dialogue capture
Familiarity with Act-Two best practices

Step 1 — Recording the Dialogues

Record your dialogues before generating videos. Keep dialogues under 30 seconds total, which is Act-Two's maximum duration. Choose one recording approach:

Solo recording: Record the first character's dialogue, then play it back while recording the second character. This helps with pacing and creates natural conversation flow. Background audio from playback can be cleaned up later.

This approach will reduce editing later if you plan on changing the character voice.

Collaborative recording: Record both characters in a single take, then separate the audio into individual character files. This feels more natural but requires coordination.

You should end up with two separate performance videos—one for each character. Note the total duration of the longest performance as this will inform your video generation strategy.

Step 2 — Generating a Multi-Character Image (Optional)

If you don't already have an image with multiple characters, create one using Gen-4 Image Reference. Upload reference images of your characters to Runway, then reference them in your prompt using the @ symbol.

In example, we'd prompt the following for a car scene:

@bryan driving a car. @jess sitting in the passenger seat. night. cinematic. film grain. art directed

References maintains consistent character appearances across generations. For more composition control, use the sketch feature first, then add your prompt to guide the final generation.

This resulting image serves as the foundation to build out your base video.

Step 3 — Creating Ambient Motion with Gen-4 Video

Next, generate the base video that serves as your scene foundation. Since these characters are in the same scene, using a character video will ensure that the ambient scene motion matches up.

Load your multi-character image into Gen-4 Video and set duration to 10 seconds. Starting with 10 seconds reduces total generations needed, even if your dialogue is longer.

You won't need a detailed prompt here in most cases. Gen-4 recognizes people and scenes automatically and generates appropriate ambient motion.

Review the generated video. Look for natural background movement and believable character motion—subtle head movements, hair motion, or background scenery. This creates the foundation that Act-Two will enhance with lip-sync and facial expressions.

Step 4 — Extending the Base Video

For dialogues longer than 10 seconds, create a seamless extension of the video you previously generated. Having a base character video in a similar duration to the performance dialogue will prevent Act-Two from adding a boomerang (reverse) effect to your video to match the duration.

Extract the final frame from your first video as a still image by selecting the Use frame > Input for video quick action, then generate another 10-second clip without a prompt.

This works because both videos share a common frame: the last frame of Video 1 becomes the first frame of Video 2.

The image below illustrates how using the last frame of a video will help you create a seamless shot:

If your dialogue is over 20 seconds, you'd repeat this step once more using the last frame of Video 2.

In your editor, position the clips so the shared frames align perfectly, similar to the illustration above. You may choose to trim one duplicate frame or add a subtle blending transition between both videos for a more seamless transition.

Export the combined video once you're happy with the edits. You now have roughly 20 seconds of consistent background motion, which becomes your character video input for Act-Two.

Step 5 — Generate Character Performances with Act-Two

Switch to Act-Two and upload your combined character video. Both characters are visible, but we need to isolate them individually to ensure that the performance is applied to the correct character.

Change the aspect ratio to crop your video. Any aspect ratio should work as long as only one character is visible.

Upload the driving performance recording for the visible character and generate. Act-Two will apply the lip sync and facial expressions from your performance to the character video, allowing you to maintain consistent scene motion.

Set up your second generation by cropping to the other character and uploading their corresponding dialogue recording. Generate this performance as well.

Both characters will have the same background motion and lighting since we used a character video, but each will have unique facial performance that matches their dialogue.

Tip: Use Runway's Stylize Speech tool to transform character voices. This is available under Generate Audio tools.

At this point, you will have three videos:

Combined base video (without character performances)
Act-Two output for Character 1
Act-Two output for Character 2

In the next step, you'll use your local video editor to composite the clips together.

Step 6 — Combining the Videos in Your Editor

Import your combined background video as your base layer—this provides the scene foundation with both characters visible. Add your Act-Two outputs as overlay tracks positioned above the background layer.

Temporarily reduce the opacity of your Act-Two layers while positioning them. This lets you see through to the background layer, making alignment easier.

Use your video editor's controls to adjust the positioning. Position each Act-Two output over the corresponding background character. Use feathering to soften edges if needed. Alignment should be straightforward since everything was generated from the same source.

Add background audio, ambient sound, or music to enhance the scene. Additionally, you can use the videos generated earlier with Aleph to create B-roll for your scene.

Alternative Approaches for Advanced Users

For gesture control, use character images instead of video inputs.

Start with separate character images against flat green backgrounds, then use Act-Two with these images for full gesture control. Using a flat green background will make it easier to key out (or remove the background) later.

After generating, use your video editor to key out the green backgrounds and composite the characters over your desired background scene.

Load the result into Runway's Aleph tool for additional environmental motion and scene dynamics. Note that Aleph isn't a lip-sync model, so you might need to run through Act-Two again once you're satisfied with the environmental changes.

Tip: Try adding keep everything else exactly the same to your prompt if you encounter unwanted changes in your Aleph generation. This may help maintain consistency.

This approach provides more control but requires additional compositing skills and takes longer than the character video-based method.

Search

Creating Multi-Character Dialogues with Act-Two