Introduction
Act-Two allows you to animate characters using driving performance videos. By providing a driving performance of someone acting out a scene and a character reference (image or video), Act-Two transfers the movement to your character, bringing them to life with realistic motion, speech, and expression.
When using a character image, Act-Two lets you control hand and body movements through the performance video with the gesture control setting. Act-Two automatically adds environmental motion to input images to create more natural-looking shots in a single generation.
This article provides an overview of Act-Two creation, covering best practices, input considerations, and detailed guidance for optimal results.
Article highlights
- Act-Two automatically adds environmental motion to character image inputs
- Act-Two works well with a range of angles, non-human characters, and styles
- Controlling gestures with the driving performance is available when using a character image
Act-Two spec details
| Cost | 5 credits per second, 3 second minimum |
| Supported durations | Up to 30 seconds |
| Infinite generations in Explore Mode | Yes |
| Platform availability | Web |
| Base prompt inputs | Driving performance: Video Character: Image or Video |
| Output resolutions | 16:9 — 1280x720 px 9:16 — 720x1280 px 1:1 — 960x960 px 4:3 — 1104x832 px 3:4 — 832x1104 px 21:9 — 1584x672 px |
| Frame rate (FPS) | 24fps |
| Gesture control | Supported with character images |
Step 1 — Selecting the inputs
To access Act-Two, begin by opening a session in your dashboard. Ensure the Gen-4 Video model is selected, then select Act-Two mode:
You can switch back to Gen-4 Video at any time by switching back to Prompt mode.
To create with Act-Two, you'll need two inputs:
- Driving performance: A video of a person acting out the scene you want to animate
- Character input: An image or video of the character you want to bring to life
The driving performance captures the movement, expressions, audio, and gestures that will be transferred to your character input. You can record the performance video directly through the web app, or drag and drop a video from your computer to the prompt canvas.
Choosing between a character image or video
Act-Two allows for a more seamless experience when working with character images.
Generations with character images will automatically add environmental motion and allow you to control gestures and body movement through the performance video.
When using character videos, generations retain the subject, environment, and camera motion from the original video. Act-Two controls the facial movement and expressions, but gesture control is not available.
The examples below show the differences in motion control when using a character image or video with the same performance:
| Performance video | Character input | Output | |
| Image | |||
| Video | |||
The result from the character image added a subtle handheld shake and allowed for gesture control, while the character video added facial motion while retaining the original camera and scene movement.
If you're undecided on whether to use a character image or video, click to learn more about each consideration:
Duration of performance and character video
When your performance is longer than your character video, the character video will loop with a reversed "boomerang" effect to match the performance length.
When natural results are ideal, you may opt to use a character image if you don't have a character video that matches the length of your performance.
Importance of camera and environmental motion control
Using a character video will use the camera, character, and environmental motion within the video in the completed generation. Using a character image will automatically add camera and environmental motion.
If particular camera and environmental motion is needed for your shot, opting for a character video may be ideal.
In additional to the above considerations, using inputs that follow our best practices will yield the highest quality of results.
Best practices
Click each input type below to learn about best practices for selecting your performance and character inputs:
Performance video
- Feature a single subject in the video
- Ensure the subject's face remains visible throughout the video
- Frame the subject, at furthest, from the waist up
- Well-lit, with defined facial features and expressions
- Certain expressions, such as sticking out a tongue, are not supported
- No cuts that interrupt the shot
- Ensure the performance follows our Trust & Safety standards
- [Gestures] Ensure that the subject's hands are in-frame at the start of the video
- [Gestures] Start in a similar pose to your character input for the best results
- [Gestures] Opt for natural movement rather than excessive or abrupt movement
Character image
- Feature a single subject
- Frame the subject, at furthest, from the waist up
- Subject has defined facial features (such as mouth and eyes)
- Ensure the image follows our Trust & Safety standards
- [Gestures] Ensure that the subject's hands are in-frame at the start of the video
Character video
- Feature a single character in the video
- Frame the subject, at furthest, from the waist up
- Subject has defined facial features (such as mouth and eyes)
- No cuts that interrupt the shot
- Use a video close to the duration of the performance for the most natural results
Once you've selected your inputs that follow best practices, you're ready to review the settings.
Step 2 — Configuring the settings
There are two settings you can configure before starting your generation.
Gestures
The gesture setting controls whether poses, gestures, and bodily motion from the performance video are transferred to character image inputs. When this setting is disabled, Act-Two will focus on adding facial and environmental motion.
This setting is unavailable when using character videos, since the motion from this video determines your character's body movements.
The examples below demonstrate the differences observed when using different gesture settings for the same inputs:
| Performance video | Character image | Gesture setting | Output |
| Off | |||
| On |
As observed above, enabling this setting will closely follow the performance video and can change the character's pose. To maintain a similar pose or direction, try to match the pose and positioning of your character image in your driving performance.
Facial expressiveness
The facial expressiveness setting controls the amount of facial motion transferred from the driving performance to your character input. The default value is 3, but you can use a lower or higher value:
- Lower values result in less expressiveness but may improve character consistency in certain cases
- Higher values result in more expressiveness but may lead to visual artifacts in certain cases
We recommend testing your inputs with the default value first, then making adjustments as needed based on the results.
Step 3 — Generating the Act-Two video
Once you've confirmed the settings, you're ready to generate. You can hover over the duration modal to see the calculated credit cost before generating.
Click the Generate button after confirming that you’re content with the selected inputs, settings, and credit costs.
Your video will begin processing in your current session, where each video will be available for review once complete.
Understanding Act-Two Pricing
Act-Two charges 5 credits per second with a minimum of 3 seconds. This means that driving performance videos under 3s will result in a charge of 15 credits.
Iterating and troubleshooting tips
- Make sure any unique features you want conveyed are visible in the character image or video. For example, if your character should have fangs, include an image or video where their teeth are clearly shown.
- Try starting your performance with the palms facing the camera in your performance video for improved consistency.
Step 4 — Changing the Voice
To change the voice in a completed Act-Two video, click Actions (...) below the video and select Change Voice. In the voice selection panel, click Play to preview available voices, then click a voice name to select it. Click Generate to start a new generation to apply the changes.
Please note that the generated voice may match the accent of the original audio.
Audio Quality Considerations
Clear audio with consistent quality produces optimal results. Ensure the original recording has minimal background noise and consistent volume and pitch levels.