Notice: Act-Two significantly expands on Lip Sync's capabilities. Click here to learn more about the latest model. Act-Two requires a driving perfomance video, so Lip Sync may be useful in cases requiring text to speech or audio-driven generations.

Introduction

Lip Sync allows you to synchronize your Text to Speech scripts or uploaded audio to animate a photo or video of your choice. This tool supports multiple faces, enabling you to seamlessly build engaging character interactions through dialogues.

In Lip Sync, dialogue refers to a single phrase spoken individually at a time. Dialogues are the building blocks of engaging conversation in Lip Sync. Each dialogue represents a unique line of speech spoken between characters. Two dialogues cannot overlap each other.

Speakers refer to the faces selected to be Lip Synced to the dialogue.

This article covers how to create Lip Sync videos, recommended best practices, using Multi-Face Lip Sync, and more.

Spec Information

Cost	5 credits per second of video output
Supported Durations	40 seconds per dialogue Up to 10 dialogues
Explore Mode on Unlimited Plans	No
Platform Availability	Web
Supported Visual Inputs	Image, video
Supported Dialogue Inputs	Audio, text scripts
Text Character Limits	600 characters per dialogue
Maximum Speakers	Up to four animated faces per video
Maximum Resolution	2k
Supported Aspect Ratios	Same as uploaded image or video
Frame Rate (FPS)	Image input: 24fps Video input: Same as uploaded video

Best Practices for Lip Sync Input

Character Videos

Human faces (animal and cartoon faces are not currently supported via Lip Sync)
- As an alternative, please consider using Act-Two to generate non-human faces
Faces are forward-facing in the direction of the camera
Faces are framed from around shoulders and up, and not extremely close or far from the camera
Photorealistic in style, not animated or overly stylized
No significant mouth, camera, body, or head movement
No significant lighting changes
No cuts that interrupt the shot
A supported video file type

Character Images

Human faces (animal and cartoon faces are not currently supported)
Forward-facing in the direction of the camera
Faces are framed from around shoulders and up, and not extremely close or far from the camera
Faces are not extremely close together (for Multi-Face scenes)
Photorealistic in style, not animated or overly stylized
A supported image file type

Uploaded Speech Audio

Total of uploaded speech is less than 40 seconds
Clear recording with voice only
A support audio file type

Step 1 — Preparing to Generate

Begin by navigating to your Dashboard. Under Create, select Generate Audio. This will take you to the Generative Audio prompting window. From here, select the Lip Sync video icon on the left-hand side:

This canvas is where you’ll configure your character and speech inputs to generate your video.

Step 2 – Selecting your Image or Video Input

Lip Sync supports image and video input, each with their own set of recommended best practices. Ensure that your inputs follow best practices listed at the beginning of the article to achieve the desired results.

Select an image or video and wait a moment for Lip Sync to detect the face(s).

You'll need to choose up to four faces to use when uploading images or videos that contain more than four.

If a face is not detected, you may need to make adjustments to your input image so it complies with the best practices.

You’re now ready to configure your script or audio input.

Step 3 – Configuring your Text to Speech or Audio Input

You can add audio to your Lip Sync video in two ways:

Text to Speech

Begin by entering a script in the text box. Lip Sync supports scripts up to 600 characters per speaker dialogue.

Next, select the microphone button next to the dialogue to open the voice selection menu.

You can choose between Runway’s preset voices or any custom AI voice models (available on Pro plan and higher) that you’ve made in the Generative Audio tool.

Click the Play button to preview each voice before making a selection.

Uploaded Audio

Alternatively, you can use uploaded audio files for your Lip Sync. Click Upload Audio to select an existing audio file or upload a new one.

Each dialogue must be under 40 seconds. Audio over 40 seconds will be automatically trimmed.

You can preview your audio file by clicking the Play button after making a selection.

Multi-Face Scenes

You’ll have the option to choose a speaker for each dialogue when using an input image or video with more than one character.

Click the character selection button to choose the first speaker and configure their dialogue with Text to Speech or Upload Audio.

Select the Add speaker button to add more dialogue. You’ll once again be able to configure both a speaker and your choice of text or audio input. You can add up to 10 dialogues per Lip Sync video.

Step 4 – Generating your Lip Sync Video

After confirming that your dialogue and character selections are properly configured, you’re now ready to generate your video.

Note: Input videos that are shorter than the length of the output video will apply a boomerang effect to your video. A boomerang effect in a video is when the footage plays forward and then immediately rewinds back to the start, creating a loop.

Click the purple Generate button and wait for your video to process.

Next, you can add multiple Lip Sync clips to a Video Editor Project to combine them into a single longer scene.

Generating a Lip Sync in Explore Mode

While Lip Sync doesn't support unlimited generations through the tool, you can create Lip Sync videos using Explore Mode on the Unlimited Plan through a Generative Session.

Click Generative Session from your dashboard
Either enter a prompt to create a character with minimal movement, or upload your own image
Generate the video
Once the generation is complete, you'll see a Use button underneath the output video. Click this and then select Lip Sync to generate a Lip Sync video in Explore Mode with the output

This method will work with Text to Speech, recorded audio, and uploaded audio. Please note that you would not be able to create unlimited Lip Syncs on uploaded videos, as those would have to go through the Generative Audio section of your dashboard.

Creating with Lip Sync