Avatar Training

Learn how to create the best personal avatar with a high-quality training video.

Training Overview

To clone your personal avatar, we first need you to submit a training video. A high-quality training video helps our AI model properly map your face and voice, resulting in a more realistic avatar overall. Your training video will be one continuous video, containing the following, in-order:

5 seconds of silence (listening mode)

25 seconds of talking

You need to upload a 9:16 or 16:9 video clip that meets the cloning training requirements on our platform, based on your personal needs.

Training typically takes around 4-5 hours to complete. You can check the status of your avatar training by:

Enable cloned avatar from [Personal avatars] when creating conversation You can click on [Create Avatar] to create and submit training, and enter [Personal avatars] to view the training results

How do I record 5 seconds  of silence?

Your training footage should begin with 5 seconds of silence. Our model uses your silent footage to create a natural resting position for your avatar’s head and improve its listening behavior. During this period:

  • Pretend that you are “actively listening” to someone
  • Incorporate small (but non-repetitive) head movements
  • Ensure that your lips are closed the entire time

How do I record 25 seconds of talking?

We do not require a predefined script. You are welcome to discuss anything that showcases your natural speaking style and expertise.

How do I create a high-quality training video?

To ensure your avatar is the best possible quality, follow the guidelines below before recording your training footage.

✔️Video Material Requirements:

  • Framing: The subject must be centered in the video.Both green screen and live-action shooting are acceptable
  • Hand Movements: Avoid hand gestures during normal speech segments.
  • Silent Segment: First 5 seconds must show completely silent performance (no speaking) with natural listening expressions:
    • Subtle smiling
    • Occasional blinking
    • Slight nodding
  • Technical Specs: MP4 format, 30 seconds to 2 minutes duration

✔️Shooting Guidelines:

Set up environment :

ensure that you are in a quiet, well-lit area without background movement.

  • Check that your face is evenly lit without any shadows. A large diffuse light works best for neutral lighting.
  • Avoid environments with background noise or reverb (e.g. air conditioning, construction).
  • Keep your background clear by removing moving objects and people.

Green Screen Setup:

  • Fully cover frame with wrinkle-free green screen
  • Eliminate shadows for optimal keying
  • Confirm perfect setup before recording

Hair & Accessories:

  • Avoid dangling earrings (interferes with lip-sync AI)
  • Use hairspray to control flyaway hairs (critical for clean keying)
  • Avoid colors close to green/yellow (e.g., yellow-green) when using a green screen.
  • If possible, avoid beards, glasses, high-collar shirts (e.g. turtlenecks) and accessories (e.g. hats).

Lighting & Positioning:

  • Use even “three-point lighting.” For shadow-free chin areas, add reflectors or foam boards (“mibo”).
  • Stand 1-2 meters from the green screen to prevent edge spill or shadows.
  • Resolution: 4K (3840×2160) preferred; minimum 1080P (1920×1080).
  • Frame Rate: 25 fps.
  • Lens: ~50mm (full-frame equivalent); use 85mm for a slimmer face effect.
  • Aperture: smaller aperture for green screen shoots. Adjust for background blur in real-scene shoots (ensure subject clarity).

Smartphones:

  • Shoot in highest quality (1080P+). iPhone users: Select “Most Compatible” format.

Record training video

The first 5 seconds of the training video must show the subject in complete silence (no speaking). Natural listening cues like subtle smiles, slow blinking, or slight nodding are allowed to simulate an engaged listening state。Next, read the consent script, followed by 30 seconds of talking

  • Aim for an engaging tone and a relaxed pace, while maintaining continuous eye contact with the camera.
  • Minimize body movement, such as hand gestures, head movement, jolts, etc.
  • Close your lips during pauses and at the end of sentences.
  • If you stumble, continue speaking. Perfection is not necessary!
  • Your training video will be one continuous video.You can record your training video in any language you prefer

Submit your training video

After ensuring that your training video fits our quality requirements, submit your video

Next Steps after Training

Upon submission, your avatar will immediately begin training in the background. After around 4-6 hours, you can enter [Personal Avatars] to check your personal avatar is ready for use. If you’re not happy with the results, be sure to contact us.

Congrats on finishing the training process — now explore generating videos or starting