Google DeepMind Launches Dual AI Robot System, Marking a Major Breakthrough in Robotics Technology

In the vast expanse of technology, robotics is undergoing a revolutionary transformation. Recently, the Gemini Robotics project launched by Google DeepMind shines like a dazzling new star, illuminating a new direction for the development of robotics technology. This project has released two new models that work in synergy, enabling robotic systems to possess the ability to ‘think’ before acting for the first time, a breakthrough that is likely to rewrite the current limitations of robots that can only perform specific tasks.

In recent years, generative AI technology has become increasingly common in the fields of text, image, audio, and video creation. Now, this remarkable technology is also being applied to generate robotic action commands. The DeepMind team believes that generative AI holds unique importance for robotics technology because it can unlock the general capabilities of robots. Historically, the core issue faced by current robots is excessive specialization. Each robot requires extensive training for specific tasks, and their performance is often subpar when faced with other tasks. Carolina Parada, head of Google DeepMind’s robotics division, pointed out: ‘Today’s robots are highly customized, making deployment difficult; installing a robot unit that can only perform a single task often takes months.’ This characteristic of traditional robots greatly limits their application range and flexibility, making transitions between different scenarios cumbersome and inefficient. The introduction of generative AI technology is expected to break this deadlock, bringing new development opportunities for robotics technology. It allows robots to no longer be confined to specific tasks but to flexibly adjust their actions based on different needs and environments, thereby better adapting to diverse work scenarios.

However, the fundamental characteristics of generative systems make AI-driven robots more multifunctional. They can respond to new environments and workspaces without the need for reprogramming. Currently, DeepMind’s robotic technology relies on the collaboration of two models, one responsible for thinking and the other for execution. These two new models are named Gemini Robotics 1.5 and Gemini Robotics – ER 1.5. The former is a vision-language-action model that generates robotic action commands using visual and textual data. The latter, with ‘ER’ standing for embodied reasoning, is a vision-language model that receives visual and textual inputs and generates the steps required to complete complex tasks. Gemini Robotics – ER 1.5 is the first robotic AI system with simulated reasoning capabilities, similar to the reasoning processes of modern text chatbots. Although the term ‘thinking’ may not be entirely accurate in the realm of generative AI, DeepMind still uses it to describe this capability. According to DeepMind, the ER model has achieved top scores in academic and internal benchmark tests, indicating its ability to make accurate decisions about how to interact with physical spaces. However, it does not execute any actions on its own; this requires the cooperation of Gemini Robotics 1.5. This dual-model architecture design fully leverages the strengths of both models, allowing robots to complete tasks more intelligently in complex environments. By separating thinking from execution, robots can process various information more efficiently, improving the quality and efficiency of task completion.

For example, when a robot needs to sort a pile of clothes into white and colored categories, Gemini Robotics – ER 1.5 will handle the request and analyze images of the physical environment. This AI system can also utilize tools like Google Search to gather additional data, then generate natural language instructions to provide the robot with specific steps to complete the task. The innovation of this dual-model architecture lies in separating reasoning from execution. The reasoning model focuses on understanding task requirements and environmental conditions, formulating detailed action plans; the execution model is responsible for translating these plans into specific robotic actions. This division of labor allows the robotic system to possess complex thinking capabilities while maintaining precise execution efficiency. From a technological development perspective, this breakthrough may mark a significant turning point in robotics technology from specialization to generalization. Traditional robots require extensive training and debugging for each new task, while robots equipped with generative AI capabilities can theoretically adapt to new work environments quickly through natural language instructions. Of course, this technology is still in its early stages, and various challenges may arise during actual deployment. Issues such as robot performance in complex real-world environments, safety assurance, and cost control need further resolution. However, DeepMind’s efforts undoubtedly point to a hopeful direction for the future development of robotics technology. As AI technology continues to advance, we may soon witness a historic moment when robots transition from simple task executors to true intelligent assistants. In the future, robots are expected to play important roles in more fields, such as industrial production, logistics distribution, and home services, bringing more convenience and innovation to human life and work.

Leave a Comment