Google DeepMind Unveils Gemini Robotics 1.5 for Advanced Physical AI Agents

Google DeepMind has announced the launch of Gemini Robotics 1.5, a new suite of AI models designed to bridge the gap between artificial intelligence and the physical world, enabling robots to perform complex, multi-step tasks. The development introduces agentic capabilities, allowing robots to perceive, plan, think, use tools, and act with greater autonomy. As noted by a social media user, this represents "A major step in linking thought and action!"

The new initiative features a dual-model approach: Gemini Robotics 1.5 and Gemini Robotics-ER 1.5. Gemini Robotics 1.5 is described as a vision-language-action (VLA) model that translates visual information and instructions into motor commands for robots. Its counterpart, Gemini Robotics-ER 1.5, is a vision-language model (VLM) focused on embodied reasoning, excelling at planning, making logical decisions within physical environments, and natively calling digital tools like Google Search.

A key advancement highlighted is the models' ability to "think before acting," allowing robots to generate internal reasoning sequences to tackle semantically complex tasks. This enables them to break down longer tasks into simpler segments and generalize learning across different robot embodiments, accelerating skill acquisition. For instance, tasks learned on one robot can be transferred to others, regardless of their physical form.

Google DeepMind emphasized that Gemini Robotics-ER 1.5 is now available to developers via the Gemini API in Google AI Studio, while Gemini Robotics 1.5 is accessible to select partners and through a trusted tester program. These models are expected to enhance robot usability by allowing natural language commands and increasing autonomy in open-ended environments. The company also stressed its commitment to safety, implementing a holistic approach through its Responsibility & Safety Council to ensure responsible deployment in human-centric settings.