
Key Features
Face-to-face interactions
The first interface that speaks our language. CVI is multimodal and understands and uses facial expressions, body language, and has natural conversational awareness including interrupts and turn-taking.
World's lowest latency
The world’s fastest interface of its kind, with SLAs as fast as under 1s latency utterance-to-utterance.
End-to-end solution
CVI provides a turn-key solution, delivering all the components to easily deploy AI video agents without having to worry about WebRTC, ASR, or anything else.
Focused on naturalness
Easily create high-quality AI avatars of you or your customers, powered by our state-of-the-art avatar models.
Face-to-face interactions
The first interface that speaks our language. CVI is multimodal and understands and uses facial expressions, body language, and has natural conversational awareness including interrupts and turn-taking.World’s lowest latency
The world’s fastest interface of its kind, with SLAs as fast as under 1s latency utterance-to-utterance.End-to-end solution
CVI provides a turn-key solution, delivering all the components to easily deploy AI video agents without having to worry about WebRTC, ASR, or anything else.Focused on naturalness
Easily create high-quality AI avatars of you or your customers, powered by our state-of-the-art avatar models.What does a conversation with CVI look like?
Here’s a sample:
Try it out!
You can try chatting with Elara on our website to get a taste of what a conversation with CVI looks like.**Try Out CVI Now!**Note that Elara can see and hear you.What components does CVI provide, and what can I customize?
CVI provides a full pipeline allowing you to easily create video conversations. You can immediately jump into a real-time conversation with the generated Session link URL. CVI provides the following layers:- WebRTC/Session link (using Daily)
- Speech recognition (ASR), with interrupts, and Semantic/Lexical turn taking, using our model.
- Optimized, conversational LLM
- Text-to-speech (TTS)
- Use OpenAI real-time API or other voice-to-voice models
- Bring your own LLM/conversation logic or enable function calling for DUIX-optimized LLMs.
- Customize the TTS or ASR engine, and turn taking settings
- Use text parrot mode to directly drive an avatar video.
- Directly access the video streams and create a custom UI.