Image Source: Unsplash
▎See the future first.
Author|Rao Xiangyu
Editor|Zhong Yi
This article was first published on the Titanium Media App
Whether you accept it or not, artificial intelligence technology has begun to significantly transform the real world.
In the consumer electronics field, mobile phone and computer manufacturers are embedding various types of AI large models into various terminal products. These companies generally believe that AI can greatly improve the industry status that is stuck in innovation bottlenecks, reigniting consumer purchasing demand. In the automotive industry, Tesla pushed the official version of FSD (Full Self-Driving) to 1.7 million car owners across the United States in early April, and the end-to-end neural network AI system makes driving decisions more like human drivers, such as continuously changing lanes across four lanes. More importantly, so far, there have been no major accidents reported with Tesla’s FSD.
Where will AI go next? Venture capitalists accustomed to capturing trends are beginning to gather consensus in the humanoid robot industry.
In China’s primary market, in October 2023, humanoid robot startup Zhijidongli completed nearly 200 million yuan in angel and Pre-A round financing; in December of the same year, Zhiyuan Robotics, established for less than a year, secured 600 million yuan in financing; in January 2024, Xingdong Jiyuan, established for less than half a year, announced it had completed over 100 million yuan in angel round financing; in February 2024, Yushu Technology completed B2 round financing amounting to 1 billion yuan.
“The development of China’s robotics industry has gone through several ups and downs. From 2013 to 2014, investment in industrial robots began to take off; from 2016 to 2017, there was another wave of investment in collaborative robots. Since 2022, general humanoid robots have become the focus of industry attention.”
Yan Qianhang, Vice President of Fengrui Capital, told Titanium Media APP that the market penetration rate of domestically produced industrial robots in China has reached about one-third, and the entire robotics industry is gradually maturing. The qualitative change brought about by AI large models has made everyone realize that the intelligence level of robots will continue to rise and gradually generalize.
As for when general humanoid robots can truly go into production and enter homes, startups in the industry have different judgments. Zhijidongli believes that it will take 5-8 years for general humanoid robots to replace humans in fine operations on production lines; to truly enter the household market, it will take 8 to 10 years. Wang Xingxing, founder of Yushu Technology, told Titanium Media APP, “By the end of 2025, more generalized humanoid robots will appear, and I feel I have already seen the direction.”
Elon Musk, A Call to Action
What ignited the “fire” of humanoid robots? Almost all respondents gave a consistent answer: Tesla’s founder – Elon Musk.
In February 2022, Tesla completed the manufacturing of the Optimus development platform; seven months later, at Tesla’s AI Day 2.0, Musk showcased the prototype of the Optimus platform’s robot, which could walk and carry items independently. By the end of 2023, the second-generation Optimus was officially unveiled, weighing 10 kg less, walking 30% faster, and featuring more dexterous hands and a higher degree of neck freedom.
After Musk entered the field, the entrepreneurial wave in humanoid robots was completely ignited.
Tesla humanoid robot Optimus
Since 2023, a batch of domestic humanoid robot products has been launched, including Yushu H1, Zhiyuan Expedition A1, Fourier GR-1, Xingdong Jiyuan “Little Star”, Zhijidongli CL-1, and XPeng PX5. In the secondary market, the company known as the “first stock of humanoid robots” in China, UBTECH, saw its stock price rise by more than 88% during trading, although the company’s primary source of income is not humanoid robot products.
In the overseas market, in May 2023, Norwegian humanoid robot startup 1X announced it completed a $23.5 million A2 round financing led by OpenAI. Almost simultaneously, American humanoid robot company Figure secured $70 million in A round financing. In January 2024, 1X completed another $100 million B round financing, with investors including EQT Ventures and Samsung NEXT. A month later, Figure announced it completed $675 million B round financing, with investors including Microsoft, OpenAI, and NVIDIA.
“In 2022, OpenAI had not yet released ChatGPT, but Musk may have seen the potential of GPT ahead of the industry.”
Wang Xingxing told Titanium Media APP that Musk has proven his success in both the automotive and commercial aerospace industries. Therefore, when Musk began working on humanoid robots, governments, markets, and capital institutions felt they had to accelerate their entry and could not wait until Tesla truly produced them. Of course, the more fundamental reason for the attention on humanoid robots is the emergence of AI large models.
According to Wang Xingxing, previously, Yushu Technology had no intention of entering the humanoid robot track because humanoid robots are too complex, and traditional algorithms cannot handle such complex machines at all. However, the development of AI technology has far exceeded expectations. For example, it used to take one to two years to teach a humanoid robot to walk, but now, using AI algorithms, it can be achieved in a month.
“The training algorithms for traditional humanoid robots rely on some clever human brains to write mathematical equations and then solve these equations to formulate the robot’s movement trajectory. However, these equations have significant limitations; if the environment changes, they may become unusable, requiring new equations to be designed.”
Wang Xingxing further explained that this training method leads to a very large amount of code, and when the system becomes complex to a certain extent, it is impossible to maintain this system solely by human effort. But for AI, as long as the model is built well enough, and data and computing power are continuously fed to the AI, it can keep trying and making mistakes. By using the reward mechanism in reinforcement learning algorithms, AI can automatically keep good training results and discard bad ones, significantly improving training efficiency.
With the efficiency gains brought by AI, Yushu Technology launched its first humanoid robot product in just six months. At the 2024 GTC conference, NVIDIA CEO Jensen Huang appeared alongside nine humanoid robots. Among them, the second humanoid robot from the left is Yushu Technology’s Unitree H1.
Image Source: NVIDIA Official Website
It is important to note that this wave of enthusiasm for humanoid robots has even forced the field’s pioneer – Boston Dynamics – to make changes.
Boston Dynamics is an American engineering and robotics design company founded in 1992. In 2013, Boston Dynamics unveiled the humanoid robot Atlas in a competition hosted by the U.S. Department of Defense.
After multiple iterations, Atlas can perform various complex actions, such as running quickly, rotating 360 degrees, jumping, and overcoming obstacles. In terms of motion control, Atlas uses the traditional algorithm of “solving a large number of equations” and is powered by hydraulic devices.
“The previously disclosed cost of Atlas was about $2 million each. Currently, the humanoid robot products on the market, Yushu Technology’s products are priced at around 600,000 yuan, while Fourier’s is around 1 million yuan,” said Xi Yue, co-founder of Xingdong Jiyuan, to Titanium Media APP, highlighting the significant cost difference between Boston Dynamics and the new generation of humanoid robots.
On April 16, 2024, Boston Dynamics announced the official “retirement” of the hydraulic version of Atlas. Subsequently, Boston Dynamics launched a new all-electric Atlas, which, like all current humanoid robot products, uses batteries as the power source. In terms of future control algorithms, Boston Dynamics will likely also adopt more efficient AI models.
Three Unsolved Problems: Brain, Cerebellum, and Body
The current heat of humanoid robots is like a small flame that has just begun to ignite. If AI and hardware continue to iterate every year, the disruption of this industry to the real world will be very strong.
Wang Xingxing stated that by the end of next year, at least one company globally will be able to develop a relatively general robot large model. This foundational large model is like a complete building block, where the large language model is just one piece, and other components include visual perception, tactile perception, decision-making, and interaction.
However, this judgment has not yet reached consensus in the humanoid robot industry. The more mainstream view is that for humanoid robots to achieve a greater degree of generalization, breakthroughs must be made in the brain, cerebellum, and body simultaneously, which is nearly impossible to accomplish in the short term.
The so-called brain refers to the robot’s understanding ability, which is the robot’s understanding of human instructions and environmental perception. The cerebellum refers to the robot’s fine motor control capabilities; the body refers to the various components that make up the humanoid robot prototype, such as joints, limbs, and head.
“The emergence of large models mainly enhances the brain’s capabilities of robots,” said Liu Pengqi, Executive Director of Fengrui Capital, to Titanium Media APP.
Yan Qianhang also stated to Titanium Media APP that, like a “brain in a vat,” the current large models are merely a brain that inputs and outputs language or multimodal information, existing independently of machines or bodies. In the future, what kind of body large models should connect to in order to fully exert their generalized functions is still a process of exploration for both investors and entrepreneurs.
As for the cerebellum, current humanoid robots have made significant progress in upright walking, whether on flat ground or rugged terrain. In specific scenarios, Figure 01 became the first humanoid robot to “pick up an apple”; the Stanford team’s Mobile ALOHA demonstrated good capabilities in cooking and tidying up items.
Image Source: Figure Official
However, to achieve complete generalization, humanoid robots still have a long way to go. Whether picking apples or cooking, these actions reflect the robot’s ability to learn through imitation, meaning it learns a single skill by repeatedly mimicking human actions.
“High-quality data for robots to interact with the physical world is actually hard to obtain, so imitation learning has its place – by teaching it through human interaction, some data can be accumulated. However, the current imitation learning merely teaches robots to replicate human actions, but it does not enable them to understand the driving factors behind each action. In other words, robots do not understand why actions are performed in a certain way,” Yan Qianhang stated. If a robot is tasked with completing a complex human operation like “picking up a cup of water and adding some sugar,” imitation learning may not be able to achieve that.
“The introduction of visual sensors may make robots no longer blind. However, there are many other dimensions of perceptual capabilities that today’s robots lack,” Yan Qianhang said. For instance, tactile and force sensors, although available in the market, have not been popularized in the robotics field, primarily due to their low integration, high cost, and relatively large size compared to humanoid robots.
“Reinforcement learning is a process of continuous trial and error and has stronger generalization compared to imitation learning,”
Xi Yue, co-founder of Xingdong Jiyuan, told Titanium Media APP that similar to the training method of autonomous driving, reinforcement learning can create a simulation environment for robots to train in real-world scenarios, optimizing their behavior through continuous trial and error. “After reinforcement learning training, robots can not only walk up stairs but also walk on snow and grass, achieving better generalization.”
However, it is important to note that simulation environments cannot perfectly replicate the real world, as the interactive environment and objects in the real world are more complex than those in simulation environments. This can lead to deviations when transferring simulation training results to the real world, which is a challenge currently faced by the entire industry.
Titanium Media APP has learned exclusively that Xingdong Jiyuan has open-sourced the Humanoid-Gym training framework. After the Humanoid-Gym is open-sourced, users can use the framework to conduct robot training verification in a higher-precision simulation environment, Mujoco, through a sim-to-sim conversion interface, thereby improving the efficiency and success rate of sim-to-real conversion.
Aside from the training of the brain and cerebellum, the final hurdle for humanoid robots to achieve generalization is whether the body can fully execute the action commands transmitted by the software algorithms.
“The hardware technology products of humanoid robots mainly focus on sensors, actuators, drivers, energy management, and new materials,”
Li Junlan, Research Manager at IDC China, told Titanium Media APP that currently, although various sensors have been applied in humanoid robots, there is still room for improvement in terms of accuracy, response speed, and integration. Meanwhile, humanoid robots consume a lot of energy, and high-efficiency energy management and storage technologies are also significant challenges.
“The introduction of visual sensors may make robots no longer blind. However, there are many other dimensions of perceptual capabilities that today’s robots lack,” Yan Qianhang stated, noting that while there are many tactile and force sensors available in the market, they have not yet been popularized in the robotics field due to their low integration, high costs, and relatively large sizes compared to humanoid robots.
Due to various constraints, the path to generalization for humanoid robots has become even longer.
The More Realistic Present, The Possible Future
While the “general moment” for humanoid robots has not yet arrived, how to survive has become the most realistic situation for startups.
“Our company’s commercialization strategy is summed up in four words – lay eggs along the way.”
Zhijidongli believes that the application scenarios for embodied intelligence (including humanoid robots, quadrupedal robots, and other product forms) are very broad, and compared to humanoid robots, quadrupedal robots will have a stronger capability for commercial landing. The latter is what Zhijidongli needs to “lay eggs” – relying on the mature mobility of quadrupedal robots to achieve product commercialization.
Currently, Zhijidongli’s products include humanoid robot CL1, bipod robot P1, and quadrupedal robot W1, with the latter two focusing on applications in industrial inspection, logistics distribution, and special operations.
Image Source: Zhijidongli Official
Similarly, Yushu Technology, which was established earlier, also derives most of its revenue from quadrupedal robots. This was also the business direction that Yushu Technology focused on at its inception, and it now has multiple products such as Go2, B2, and Aliengo. Public data shows that Yushu Technology’s quadrupedal robot products account for over 60% of global shipments, leading in annual sales worldwide.
Xingdong Jiyuan states that they are currently exploring commercialization directions in niche scenarios within the automotive and consumer electronics sectors, such as factory inspections and logistics in automotive assembly lines. Additionally, there are possibilities for service-oriented tasks like welcoming guests in shopping malls.
“Of course, for startups in the humanoid robot field, financing is essential,” said Xi Yue, co-founder of Xingdong Jiyuan, to Titanium Media APP, emphasizing that the humanoid robot industry is still in its early stages, with higher technical barriers and longer R&D cycles, making early financing necessary for survival.
In fact, the emergence of the humanoid robot industry mirrors the past experiences of the domestic autonomous driving industry.
Between 2017 and 2018, a large number of autonomous driving startups emerged in China, attracting many venture capital institutions to enter the field. Similar to humanoid robots, the autonomous driving sector also requires long cycles of technological development, leading to a strong reliance on investment institutions in the early stages. However, as the investment boom faded, the commercialization capabilities of autonomous driving companies began to face scrutiny. Following that, many autonomous driving teams disbanded, laid off employees, or even went to court.
“From the perspectives of technical barriers, founding teams, and industry influence, humanoid robots and autonomous driving are indeed very similar. However, the valuations of this round of humanoid robot companies are generally not as high as those of the previous round of autonomous driving companies.”
A person with experience in both the autonomous driving and humanoid robot fields stated that this is a good thing, as it prevents everyone from solely pursuing company valuations while neglecting commercialization. “The current entrepreneurs in the humanoid robot sector have partly recognized the problems and risks encountered in the previous wave of autonomous driving development, leading to a higher awareness of product commercialization.”
Additionally, this individual noted that during the autonomous driving entrepreneurial wave, people tended to work independently.
However, in the humanoid robot industry, collaboration is emphasized. For example, Beijing, Shanghai, and Shenzhen have all established humanoid robot innovation centers led by government departments. These centers aim to connect the upstream and downstream of the industrial chain, involving companies working on technology, robot joints, and commercial landing. “Everyone forms a tangible entity, and the upstream and downstream companies are all shareholder units, allowing the entire chain to connect.”
General humanoid robot mother platform “Tiangong” Image Source: Official
For example, in Beijing, on April 27, the Beijing Humanoid Robot Innovation Center released the world’s first full-sized humanoid robot “Tiangong” that can run on pure electric drive at a stable speed of 6 km/h. “Tiangong” stands 163 cm tall, weighs 43 kg, and is equipped with multiple visual perception sensors, boasting a computing power of 550 trillion operations per second, a high-precision inertial measurement unit (IMU), and 3D visual sensors.
At the press conference, the center’s general manager, Xiong Youjun, stated that to address the common issues in the humanoid robot industry and promote overall industrial development, the Beijing Humanoid Robot Innovation Center is dedicated to researching key common core technologies and building both soft and hard general mother platforms. So far, they have successfully developed the general humanoid robot mother platform “Tiangong.”
The relevant person in charge of the Beijing Economic and Technological Development Zone introduced that as an important gathering place for the robot industry in Beijing, Yizhuang currently hosts 110 robotic ecological enterprises, forming a complete industrial chain system covering core components, complete machines, and applications. In the humanoid robot sector, not only are leading companies like Xiaomi and UBTECH developing, but there are also high-precision reducers, servo systems, and other humanoid robot component products.
In terms of machine learning software algorithms, the success of Tesla’s FSD (Full Self-Driving) has also provided a possible future for the humanoid robot industry.
In Tesla’s latest FSD V12 version, FSD Beta has been renamed FSD (Supervised). According to Tesla’s official statement, under the supervision of owners, the latest version of FSD Supervised can drive a Tesla almost anywhere.
Before FSD V12, Tesla’s autonomous driving plan relied on rule-based judgments, with code supporting every driving behavior, resulting in over 300,000 lines of C++ code for FSD V11. In the FSD V12 version, the rule-based approach relying on manual coding has been completely abandoned, and the end-to-end neural network AI system is now fully adopted, reducing the code to only 3,000 lines.
The end-to-end solution of Tesla’s FSD is essentially completely data-driven. By compressing high-quality data from tens of millions or even hundreds of millions of human driving videos into a large model, Tesla’s FSD can think in an AI way – directly inputting sensor data when encountering scenarios and outputting steering, braking, and acceleration signals, all without any coding in the process.
According to information released by Tesla in October 2022, the Optimus humanoid robot uses the same full self-driving (FSD) computer as Tesla cars, as well as Autopilot-related neural network technologies.
This means that humanoid robots can fully utilize the same training methods as FSD to move towards generalization. According to Wang Xingxing, Yushu Technology’s humanoid robots have already fully adopted similar end-to-end solutions, from walking and running to dancing and flipping, with one model achieving this without any intermediate processes or coding.
“The maturity of humanoid robot hardware is just a matter of time. The most important thing is still the AI foundational large model for general humanoid robots,” Wang Xingxing stated, optimistically estimating that breakthroughs in foundational large models may occur by the end of next year. However, it is also possible that this will not happen. “Sometimes, technological breakthroughs depend on the luck of humanity as a whole. Just like if Einstein hadn’t been around, his theories would probably have been discovered eventually, just a few years or even decades later.”
(This article was first published on the Titanium Media APP)