Yes, please do listen to the voices in your head.
Researchers have developed a groundbreaking AI headphone system that can simultaneously translate multiple speakers in real-time, potentially eliminating language barriers in multilingual group conversations. The Spatial Speech Translation system not only converts foreign languages into English text but also preserves each speaker’s unique vocal characteristics and emotional tone, creating a more natural translation experience than existing technologies. This innovation could transform international communication by enabling people to confidently express themselves across language divides.
How it works: The University of Washington’s Spatial Speech Translation system uses AI to track and translate multiple speakers simultaneously in group settings.
- The technology works with standard noise-canceling headphones connected to a laptop running Apple‘s M2 chip, which supports the necessary neural networks.
- The system employs two AI models: the first identifies speakers and their locations, while the second translates their speech from French, German, or Spanish into English text.
The big picture: Unlike existing translation tools that focus on single speakers, this system addresses the challenge of following conversations where multiple people speak different languages simultaneously.
- The technology preserves speakers’ unique vocal characteristics, essentially creating a “cloned” voice that maintains the emotional tone of the original speaker.
- Researchers presented their work at the ACM CHI Conference on Human Factors in Computing Systems in Japan this month.
Why this matters: The technology could break down significant communication barriers for non-native speakers in various professional and social contexts.
- “There are so many smart people across the world, and the language barrier prevents them from having the confidence to communicate,” explains Shyam Gollakota, a professor who worked on the project.
- Gollakota shares that his mother has “incredible ideas when she’s speaking in Telugu,” but struggles to communicate with people during visits to the US from India.
What’s next: Researchers are now working to reduce the system’s latency to under one second to enable more natural conversational flow.
- The team aims to maintain the “conversational vibe” by minimizing delays between when someone speaks and when the translation is delivered to the listener.
- Current technology requires the headphones to be connected to a laptop, though the same M2 chip that powers the system is also present in Apple’s Vision Pro headset, suggesting potential for more portable implementations.
A new AI translation system for headphones clones multiple voices simultaneously