Real-Time Voice AI Models

Krisp builds ultra-low latency AI models for voice AI agents, contact centers, and real-time communication — powering 80B+ monthly audio minutes across 200M+ devices. Trusted by Discord, Twilio, RingCentral, Vonage, and more.

VIVA SDK — Voice Intelligence for Voice AI Agents

Server-side models that sit in front of your VAD or STT pipeline, optimized for on-server CPU deployment.

Model	Description
Voice Isolation	Removes background noise, secondary voices, and cross-talk from agent audio streams. Language and accent independent.
Turn Prediction	Predicts when a speaker is likely to finish their turn, enabling natural conversation flow without awkward pauses or premature interruptions. Audio-only, no transcription required.
Voice Activity Detection	Accurate real-time speech and silence detection for clean voice pipelines. Reduces false triggers and improves system responsiveness.

RTC SDK — Speech Enhancement for Human-to-Human Calls

Client-side and server-side models for noise cancellation, accent conversion, and voice translation.

Model	Description
Noise Cancellation	Inbound and outbound noise removal, background voice cancellation, and de-reverberation.
Accent Conversion	Real-time accent transformation for contact center agents — improves CSAT and AHT.
Voice Translation	Bidirectional real-time voice-to-voice translation across multiple languages.

Open Datasets & Benchmarks

turn-taking-test-v1 — 4 hours of annotated conversational audio for turn prediction benchmarking (976 shift + 1,754 hold cases, 30 speakers)

Integrations

Works with Pipecat · LiveKit · Vapi · Daily — available as compiled C, JS/WASM, Python, Node.js, Go, and Rust bindings.

Website · Developer SDK · Documentation · Engineering Blog · LinkedIn · Twitter