During Tesla AI Day in 2022, Tesla unveiled the humanoid robot Optimus, marking the beginning of a new era for humanoid robots. Over the past two years, various technological concepts such as robots, embodied intelligence, large models, and multimodal models have emerged rapidly, ushering us into the era of AI for all. Now, two years later, AI and robots are gradually entering a period of rational reflection, and it is time for us to carefully and deeply consider the current direction and technologies. Below are my in-depth thoughts on humanoid robots and AI in a Q&A format.
Q1. What are the data sources for training humanoid robot models?
A1. Generally, the data for training models comes from real data, simulation data, and synthetic data. Currently, the data for training humanoid robot models primarily comes from simulation data. However, due to differences in data distribution, real data is necessary for correction and optimization. As the data on the internet has been largely consumed by large models, synthetic data has become a reasonable and effective means. We know that all the data on the internet is a subset of the collective wisdom of humanity; the essence of large models is to make reasonable inferences based on this data, which represents the upper limit of the knowledge contained in this data. However, the information generated by all human wisdom far exceeds the digital information available on the internet. Therefore, how to better digitize or represent more human wisdom and knowledge in other forms is crucial for training large models and improving their intelligence levels.
Q2. How to represent sampled interactive data?
A2. The data sampled through various sensors essentially compresses and preserves the real-time state of the physical world. In this process, most information is lost due to dimensionality reduction, such as temporal information, coupling information with the environment, and detailed information about the state of the world. To enable AI to understand the entire world more intelligently during training, two approaches can be taken: The first is to train AI directly with higher-dimensional data. However, this involves complex sensor design and implementation issues. Fei-Fei Li proposed spatial intelligence, which refers to the robot’s ability to perceive, understand, and interact in three-dimensional space. Compared to two-dimensional image input, this undoubtedly improves the robot’s intelligence level. The second method involves reversing the coding of the sampled data based on sensor principles to restore the real state of the physical world as much as possible. Of course, much of the information is redundant interference, but if we can restore the main information, it can also significantly enhance AI training.
Q3. What kind of model should be adopted for humanoid robots?
A3. Since the rise of AI, various models have emerged, including reinforcement learning, imitation learning, transfer learning, meta-learning, and continual learning. Ultimately, the goal is for robots to interact dynamically with their environment like humans, understanding and learning from it, balancing between dynamic and steady states. From a biomimetic perspective, humanoid robots should also strive to imitate humans as much as possible, engaging in hierarchical learning, imitation learning, and continual learning.
Q4. How to achieve generalization in humanoid robot learning?
A4. Robots must have a basic understanding of the physical world, which can be termed a base model. This model should encompass the fundamental physical laws and basic knowledge discovered and summarized by humans. Based on this model, robots must interact with the real physical world to continuously learn and evolve, verifying the rationality of their reasoning in reality, thereby updating this base model. This base model can be regarded as the foundational base for all AI, allowing for model optimization in different scenarios and establishing a database to record the connections between these different scenarios. Consequently, the model can perform well across various scenarios, thereby achieving generalization.
Q5. How can humanoid robot models achieve true inheritance, sharing, and transferability?
A5. First, a base model can be trained in a simulated environment through distributed parallel training, which can be termed offline learning. Then, the model is deployed to the robot to interact with the real physical world, continuously updating the base model in what can be called online learning. The robot collects data while working, updating the model through online learning, and when it is idle, it continues training in the simulated environment, updating the model through offline learning. This mutual reinforcement accelerates training. Periodically, the base model is updated and placed in a database for sharing, allowing all robots to download it and interactively learn in the real environment to update their models. All humanoid robots collectively maintain and update the same base model, which serves as the brain of all robots, absorbing and integrating the experiences of many robots interacting with their environments while also allowing sharing and inheritance, significantly shortening training time and expanding the generalization of robots.
Q6. How to understand and implement the thinking process of robots?
A6. For robots, before completing a task, they need a blueprint and further planning for the task. Typically, they need to understand rules, which come from the constraints of the physical world and task rules, and they must fit and imitate previous experiences and data, ultimately exploring more possibilities based on this experience. In this way, robots simulate the human thought process.
Q7. What is the role of human sleep and forgetting in the model?
A7. The role of human sleep is to process daily experiences into memories, essentially updating the database and model in the human brain. The role of forgetting is to eliminate rigid cognition about the physical world, allowing for dynamic changes in perception and better survival and evolution. Similarly, for humanoid robot models, it is essential to introduce sleep and forgetting mechanisms, optimizing model parameters through sleep and diluting and correcting data pollution and erroneous experiences through forgetting, thereby reinforcing correct rules and experiences that match the real world.
Q8. What improvements are still needed in the structural design of humanoid robots?
A8. The spinal column and waist of the human body are crucial for maintaining flexibility and stability. Currently, there are no good solutions or products regarding these aspects in humanoid robot structures. If a good solution emerges and is applied, the robotic performance will be greatly enhanced.