Concerns in Humanoid Robot Development: A 15-Year Transformation

The development of humanoid robots is caught in a frenzy of investment based on erroneous premises, attempting to achieve human-like dexterity through pure visual learning while neglecting the core necessity of tactile perception. The physical bottlenecks of walking safety make the vision of “integrating into human spaces” seem distant, and in the next 15 years, their forms will completely redefine existing definitions.

Currently, humanoid robots are touted as “universal labor interfaces” capable of directly replacing humans in all physical labor without the need to adjust existing processes. Companies like Figure and Tesla even predict significant economic impacts within 2-5 years. However, according to the Gartner curve, they are still in the early stages of development, far from reaching the peak of promotion. History tells us that it often takes over a decade from laboratory demonstrations to commercial realization, making such optimistic predictions more akin to capital-driven marketing rhetoric than technical reality.

Manipulation capability is a critical shortcoming of humanoid robots. Since Heinrich Ernst’s robotic arm in 1961, the mainstream in the industrial sector has remained parallel grippers and suction-based manipulators. Although German company Schunk offers over 1,000 models, multi-joint fingers have never been practically realized due to issues of durability, strength, and lifespan.

The “dexterous manipulation” videos you see are merely performances in specific scenarios; change the task, and they fail. For instance, tasks like folding a shirt with the sleeves out or cleaning peanut butter off hands—things an 8-year-old can easily accomplish—are beyond the reach of robots. The success of end-to-end learning in speech, images, and LLMs has led to the misconception that “piling up data can solve everything,” forgetting the premise: these fields have existing front-end preprocessing technologies.

Speech relies on signal compression in telephone networks, images depend on CNNs simulating the visual cortex, and LLMs use token embeddings to capture language structures. But what about the tactile domain? Humans do not even have the technology to “record and transmit tactile signals.” Figure and Tesla’s reliance on “learning from human videos” is akin to asking a blind person to learn to paint by listening to others describe it.

The importance of touch for dexterity cannot be overstated. Human fingertips have thousands of mechanoreceptors per square centimeter, capable of sensing texture, pressure, and sliding. After anesthetizing the fingertips, even striking a match becomes four times more difficult. Current robotic hands can only fit a few pressure sensors, lacking even the most basic tactile feedback, let alone simulating the complex perception of 15 types of human tactile neurons.

While MIT labs are attempting to create systems that associate touch with action, the industry is pouring 90% of its funding into visual learning. This is not research; it is a gambler’s bet. The safety of bipedal walking is a physical deadlock. Small robots may appear safe, but scaling them to human size changes everything: mass increases cubically with size, and energy demands soar by over eight times. With rigid structures and ZMP algorithms, a fall turns them into “hammers with metal legs.”

Company videos never show people approaching walking robots because even they are afraid; within three meters is a danger zone. How can such machines possibly enter factories or homes? The future “humanoid robots” will look entirely different. Just as “flying cars” have transformed into eVTOLs and “autonomous driving” has added remote monitoring, in 15 years, they will likely discard legs for wheels, replace arms with grippers or suction cups, and install active light sensors, becoming various specialized machines while still bearing the name “humanoid.” The money invested now will likely end up like the autonomous driving bubble of the past, wasted, leaving behind a pile of forgotten prototypes.

The author, a co-founder of iRobot, has witnessed the blood and tears of robots transitioning from the lab to the market, and his critique is far from alarmist. Those CEOs who tout a “$30 trillion market” either do not understand the technology or deliberately ignore physical laws—after all, telling a good story is easier than solving real problems.

Leave a Comment