Gemini Robotics 1.5 brings AI agents into the physical world
We’re powering an era of physical agents — enabling robots to perceive, plan, think, use tools and act to better solve complex, multi-step tasks.
Earlier this year, we made incredible progress bringing Gemini's multimodal understanding into the physical world, starting with the Gemini Robotics family of models.
Today, we’re taking another step towards advancing intelligent, truly general-purpose robots. We're introducing two models that unlock agentic experiences with advanced thinking:
Gemini Robotics 1.5 – Our most capable vision-language-action (VLA) model turns visual information and instructions into motor commands for a robot to perform a task. This model thinks before taking action and shows its process, helping robots assess and complete complex tasks more transparently. It also learns across embodiments, accelerating skill learning. Gemini Robotics-ER 1.5 – Our most capable vision-language model (VLM) reasons about the physical world, natively calls digital tools and creates detailed, multi-step plans to complete a mission. This model now achieves state-of-the-art performance across spatial understanding benchmarks.
These advances will help developers build more capable and versatile robots that can actively understand their environment to complete complex, multi-step tasks in a general way.
Starting today, we’re making Gemini Robotics-ER 1.5 available to developers via the Gemini API in Google AI Studio. Gemini Robotics 1.5 is currently available to select partners. Read more about building with the next generation of physical agents on the Developer blog.
Posted on: 9/26/2025 9:58:00 AM
|