Back to GlossaryCore Technology

Speech-to-Speech

Technology that enables an AI agent to understand and respond in speech directly, without intermediate transcription.

Speech-to-Speech (S2S) is the next generation of Voice AI. Unlike the traditional approach (STT → LLM → TTS), S2S models like Gemini Native Audio understand speech and respond directly — preserving intonation, emotion, and natural pace. Key advantage: 40-60% lower latency and higher accuracy for complex languages like Hebrew. Yappr uses Speech-to-Speech with Gemini Native Audio — giving the voice agent the highest-level Hebrew understanding and speech capabilities. According to Google Research, S2S models show 15-20% higher accuracy than traditional STT approaches (Source: Google Research, 2025).