Google DeepMind recently showcased their new chatbot-powered robot in an open-plan office in Mountain View, California. The robot, equipped with the latest Gemini large language model, has been impressively navigating the office space and assisting employees with tasks.
Using a combination of video and text processing capabilities, the robot can interpret commands and efficiently maneuver around the office. For example, when asked to find a place to write, the robot can locate a whiteboard and lead the person to it. The Gemini model allows the robot to understand its surroundings and respond appropriately to various requests that require common-sense reasoning.
The success of the Google DeepMind robot demonstrates the potential of large language models to enhance physical tasks and interactions. With up to 90 percent reliability in navigating the office and responding to complex commands, the robot has significantly improved human-robot interaction and usability.
Researchers are continuously exploring the integration of language models with robotics to expand the capabilities of AI-powered systems. Startups like Physical Intelligence and Skild AI have emerged with substantial funding to develop robots with advanced problem-solving abilities. The application of vision language models in robotics shows promise in answering questions that rely on perception, such as following visual instructions.
As the field of AI and robotics progresses, the collaboration between language models and physical tasks is expected to revolutionize the way robots operate in real-world environments. Google DeepMind’s success with the Gemini model paves the way for further advancements in robotic technology and AI integration.