Google DeepMind recently showcased its humanoid robot Apollo performing household tasks like folding clothes and sorting items through natural language commands, powered by new AI models Gemini Robotics 1.5 and Gemini Robotics-ER 1.5. While the demonstrations appear impressive, experts caution that we’re still far from achieving truly autonomous household robots, as current systems rely on structured scenarios and extensive training data rather than genuine thinking capabilities.
What you should know: The demonstration featured Apptronik’s Apollo robot completing multi-step tasks using vision-language action models that convert visual information and instructions into motor commands.
- Gemini Robotics 1.5 works by “turning visual information and instructions into motor commands,” while Gemini Robotics-ER 1.5 “specializes in understanding physical spaces, planning, and making logistical decisions within its surroundings.”
- The robots responded to natural language commands to fold laundry, sort recycling, and pack items into bags.
Why this matters: The integration of large language models with robotic systems represents a significant step toward more intuitive human-robot interaction, but fundamental limitations remain for real-world deployment.
- Current systems work well in controlled environments with abundant training data, but struggle with the unpredictability of actual household settings.
- The technology addresses a long-standing goal in robotics: creating general-purpose robots that can perform routine tasks through simple verbal instructions.
The reality check: Ravinder Dahiya, a Northeastern University professor of electrical and computer engineering, emphasizes that despite impressive demonstrations, these robots aren’t actually “thinking” independently.
- “It becomes easy to iterate visual and language models in this case because there is a good amount of data,” Dahiya explains, noting that vision AI has existed for years.
- The robots operate on “a very defined set of rules” backed by “heaps of high-quality training data and structured scenario planning and algorithms.”
Missing capabilities: Current humanoid robots lack crucial sensing abilities that humans take for granted, limiting their effectiveness in complex environments.
- Unlike vision data, there’s insufficient training data for tactile feedback, which is essential for manipulating both soft and hard objects.
- Robots still cannot register pain, smell, or other sensory inputs that would be necessary for uncertain environments.
- “For uncertain environments, you need to rely on all sensor modalities, not just vision,” Dahiya notes.
What’s next: Researchers like Dahiya are developing advanced sensing technologies, including electronic robot skins, to give robots more human-like capabilities.
- These developments aim to provide robots with touch and tactile feedback, though progress remains slow due to limited training data.
- The path to truly autonomous household robots will require breakthroughs across multiple sensing modalities beyond just vision and language processing.
Humanoid robots in the home? Not so fast, says expert