This feature is currently available to users on a Standard plan or higher.

Introduction

Act-Two allows you to animate characters using driving performance videos. By providing a driving performance of someone acting out a scene and a character reference (image or video), Act-Two transfers the movement to your character, bringing them to life with realistic motion, speech, and expression.

When using a character image, Act-Two lets you control hand and body movements through the performance video with the gesture control setting. Act-Two automatically adds environmental motion to input images to create more natural-looking shots in a single generation.

This article provides an overview of Act-Two creation, covering best practices, input considerations, and detailed guidance for optimal results.

Article highlights

Act-Two automatically adds environmental motion to character image inputs
Act-Two works well with a range of angles, non-human characters, and styles
Controlling gestures with the driving performance is available when using a character image

Act-Two spec details

Cost	5 credits per second, 3 second minimum
Supported durations	Up to 30 seconds
Infinite generations in Explore Mode	Yes
Platform availability	Web
Base prompt inputs	Driving performance: Video Character: Image or Video
Output resolutions	16:9 — 1280x720 px 9:16 — 720x1280 px 1:1 — 960x960 px 4:3 — 1104x832 px 3:4 — 832x1104 px 21:9 — 1584x672 px
Frame rate (FPS)	24fps
Gesture control	Supported with character images

Step 1 — Selecting the inputs

To access Act-Two, begin by opening a session in your dashboard. Ensure the Gen-4 Video model is selected, then select Act-Two mode:

You can switch back to Gen-4 Video at any time by switching back to Prompt mode.

To create with Act-Two, you'll need two inputs:

Driving performance: A video of a person acting out the scene you want to animate
Character input: An image or video of the character you want to bring to life

The driving performance captures the movement, expressions, audio, and gestures that will be transferred to your character input. You can record the performance video directly through the web app, or drag and drop a video from your computer to the prompt canvas.

Choosing between a character image or video

Act-Two allows for a more seamless experience when working with character images.

Generations with character images will automatically add environmental motion and allow you to control gestures and body movement through the performance video.

When using character videos, generations retain the subject, environment, and camera motion from the original video. Act-Two controls the facial movement and expressions, but gesture control is not available.

The examples below show the differences in motion control when using a character image or video with the same performance:

The result from the character image added a subtle handheld shake and allowed for gesture control, while the character video added facial motion while retaining the original camera and scene movement.

If you're undecided on whether to use a character image or video, click to learn more about each consideration:

Duration of performance and character video

When your performance is longer than your character video, the character video will loop with a reversed "boomerang" effect to match the performance length.

When natural results are ideal, you may opt to use a character image if you don't have a character video that matches the length of your performance.

Importance of camera and environmental motion control

Using a character video will use the camera, character, and environmental motion within the video in the completed generation. Using a character image will automatically add camera and environmental motion.

If particular camera and environmental motion is needed for your shot, opting for a character video may be ideal.

In additional to the above considerations, using inputs that follow our best practices will yield the highest quality of results.

Best practices

Click each input type below to learn about best practices for selecting your performance and character inputs:

Performance video

Feature a single subject in the video
Ensure the subject's face remains visible throughout the video
Frame the subject, at furthest, from the waist up
Well-lit, with defined facial features and expressions
- Certain expressions, such as sticking out a tongue, are not supported
No cuts that interrupt the shot
Ensure the performance follows our Trust & Safety standards
[Gestures] Ensure that the subject's hands are in-frame at the start of the video
[Gestures] Start in a similar pose to your character input for the best results
[Gestures] Opt for natural movement rather than excessive or abrupt movement

Character image

Feature a single subject
Frame the subject, at furthest, from the waist up
Subject has defined facial features (such as mouth and eyes)
Ensure the image follows our Trust & Safety standards
[Gestures] Ensure that the subject's hands are in-frame at the start of the video

Character video

Feature a single character in the video
Frame the subject, at furthest, from the waist up
Subject has defined facial features (such as mouth and eyes)
No cuts that interrupt the shot
Use a video close to the duration of the performance for the most natural results

Once you've selected your inputs that follow best practices, you're ready to review the settings.

Step 2 — Configuring the settings

There are two settings you can configure before starting your generation.

Gestures

The gesture setting controls whether poses, gestures, and bodily motion from the performance video are transferred to character image inputs. When this setting is disabled, Act-Two will focus on adding facial and environmental motion.

This setting is unavailable when using character videos, since the motion from this video determines your character's body movements.

The examples below demonstrate the differences observed when using different gesture settings for the same inputs:

As observed above, enabling this setting will closely follow the performance video and can change the character's pose. To maintain a similar pose or direction, try to match the pose and positioning of your character image in your driving performance.

Facial expressiveness

The facial expressiveness setting controls the amount of facial motion transferred from the driving performance to your character input. The default value is 3, but you can use a lower or higher value:

Lower values result in less expressiveness but may improve character consistency in certain cases
Higher values result in more expressiveness but may lead to visual artifacts in certain cases

We recommend testing your inputs with the default value first, then making adjustments as needed based on the results.

Step 3 — Generating the Act-Two video

Once you've confirmed the settings, you're ready to generate. You can hover over the duration modal to see the calculated credit cost before generating.

Click the Generate button after confirming that you’re content with the selected inputs, settings, and credit costs.

Your video will begin processing in your current session, where each video will be available for review once complete.

Understanding Act-Two Pricing

Act-Two charges 5 credits per second with a minimum of 3 seconds. This means that driving performance videos under 3s will result in a charge of 15 credits.

Iterating and troubleshooting tips

Make sure any unique features you want conveyed are visible in the character image or video. For example, if your character should have fangs, include an image or video where their teeth are clearly shown.
Try starting your performance with the palms facing the camera in your performance video for improved consistency.

Step 4 — Changing the Voice

To change the voice in a completed Act-Two video, click Use below the video and select Change Voice. In the voice selection panel, click Play to preview available voices, then click a voice name to select it. Click Generate to start a new generation to apply the changes.

Please note that the generated voice may match the accent of the original audio.

Audio Quality Considerations

Clear audio with consistent quality produces optimal results. Ensure the original recording has minimal background noise and consistent volume and pitch levels.

Search

Creating with Act-Two