Introduction
Lip Sync allows you to synchronize your Text to Speech scripts or uploaded audio to animate a photo or video of your choice. This tool supports multiple faces, enabling you to seamlessly build engaging character interactions through dialogues.
In Lip Sync, dialogue refers to a single phrase spoken individually at a time. Dialogues are the building blocks of engaging conversation in Lip Sync. Each dialogue represents a unique line of speech spoken between characters. Two dialogues cannot overlap each other.
Speakers refer to the faces selected to be Lip Synced to the dialogue.
This article covers how to create Lip Sync videos, recommended best practices, using Multi-Face Lip Sync, and more.
Spec Information
Cost | 5 credits per second of video output |
Supported Durations | 40 seconds per dialogue Up to 10 dialogues |
Explore Mode on Unlimited Plans | No |
Platform Availability | Web |
Supported Visual Inputs | Image, video |
Supported Dialogue Inputs | Audio, text scripts |
Text Character Limits | 600 characters per dialogue |
Maximum Speakers | Up to four animated faces per video |
Maximum Resolution | 2k |
Supported Aspect Ratios | Same as uploaded image or video |
Frame Rate (FPS) | Image input: 24fps Video input: Same as uploaded video |
Best Practices for Lip Sync Input
Character Videos
- Human faces (animal and cartoon faces are not currently supported)
- Faces are forward-facing in the direction of the camera
- Faces are framed from around shoulders and up, and not extremely close or far from the camera
- Photorealistic in style, not animated or overly stylized
- No significant mouth, camera, body, or head movement
- No significant lighting changes
- No cuts that interrupt the shot
- A supported video file type
Character Images
- Human faces (animal and cartoon faces are not currently supported)
- Forward-facing in the direction of the camera
- Faces are framed from around shoulders and up, and not extremely close or far from the camera
- Faces are not extremely close together (for Multi-Face scenes)
- Photorealistic in style, not animated or overly stylized
- A supported image file type
Uploaded Speech Audio
- Total of uploaded speech is less than 40 seconds
- Clear recording with voice only
- A support audio file type
Step 1 — Preparing to Generate
Begin by navigating to Lip Sync Video in your Dashboard. This will take you to the prompting window, where you’ll configure your character and speech inputs to generate your video.
Step 2 – Selecting your Image or Video Input
Lip Sync supports image and video input, each with their own set of recommended best practices. Ensure that your inputs follow best practices listed at the beginning of the article to achieve the desired results.
Select an image or video and wait a moment for Lip Sync to detect the face(s).
You'll need to choose up to four faces to use when uploading images or videos that contain more than four.
If a face is not detected, you may need to make adjustments to your input image so it complies with the best practices.
You’re now ready to configure your script or audio input.
Step 3 – Configuring your Text to Speech or Audio Input
You can add audio to your Lip Sync video in two ways:
Text to Speech
Begin by entering a script in the text box. Lip Sync supports scripts up to 600 characters per speaker dialogue.
Next, select the microphone button next to the dialogue to open the voice selection menu.
You can choose between Runway’s preset voices or any custom AI voice models (available on Pro plan and higher) that you’ve made in the Generative Audio tool.
Click the Play button to preview each voice before making a selection.
Uploaded Audio
Alternatively, you can use uploaded audio files for your Lip Sync. Click Upload Audio to select an existing audio file or upload a new one.
Each dialogue must be under 40 seconds. Audio over 40 seconds will be automatically trimmed.
You can preview your audio file by clicking the Play button after making a selection.
Multi-Face Scenes
You’ll have the option to choose a speaker for each dialogue when using an input image or video with more than one character.
Click the character selection button to choose the first speaker and configure their dialogue with Text to Speech or Upload Audio.
Select the Add speaker button to add more dialogue. You’ll once again be able to configure both a speaker and your choice of text or audio input. You can add up to 10 dialogues per Lip Sync video.
Step 4 – Generating your Lip Sync Video
After confirming that your dialogue and character selections are properly configured, you’re now ready to generate your video.
Note: Input videos that are shorter than the length of the output video will apply a boomerang effect to your video. A boomerang effect in a video is when the footage plays forward and then immediately rewinds back to the start, creating a loop.
Click the purple Generate button and wait for your video to process.
Next, you can add multiple Lip Sync clips to a Video Editor Project to combine them into a single longer scene.
Generating a Lip Sync in Explore Mode
While Lip Sync doesn't support unlimited generations through the tool, you can create Lip Sync videos using Explore Mode on the Unlimited Plan through Text/Image to Video.
- Click Text/Image to Video from your dashboard
- Either enter a prompt to create a character with Free Previews, or upload your own image
- Add some light movement with the settings, and generate a video
- Once the generation is complete, you'll see a button on the output video to generate a Lip Sync in Explore Mode with the video
This method will work with Text to Speech, recorded audio, and uploaded audio. Please note that you would not be able to create unlimited Lip Syncs on uploaded videos, as those would have to go through the Generative Audio section of your dashboard.