Can Robots Squeeze Toothpaste? ManipTrans: Efficient Transfer of Human Hand Skills to Dexterous Hands

Can Robots Squeeze Toothpaste? ManipTrans: Efficient Transfer of Human Hand Skills to Dexterous Hands

The research team consists of interdisciplinary researchers from the Beijing Institute of General Artificial Intelligence (BIGAI), Tsinghua University, and Peking University, dedicated to cutting-edge research in the field of embodied intelligence. Team members have extensive research experience in developing efficient and intelligent general robotic technologies, particularly in the operation of dexterous robotic hands. One of the researchers is Li Kailin from BIGAI, while other authors include Li Puhao, a PhD student at Tsinghua University, Liu Tengyu, a researcher at BIGAI, and Li Yuyang, a PhD student at Peking University; the corresponding author is Huang Siyuan, a researcher at BIGAI.

In recent years, the field of embodied intelligence has developed rapidly, enabling robots to achieve human-level dexterous manipulation capabilities in complex tasks, which not only has significant research and application value but also represents a key step towards general artificial intelligence.

Currently, data-driven embodied intelligence algorithms still require precise, large-scale, and highly flexible dexterous hand motion sequences. However, traditional reinforcement learning or real-machine teleoperation methods often struggle to efficiently acquire such data.

To address this issue, the Beijing Institute of General Artificial Intelligence in collaboration with researchers from Tsinghua University and Peking University proposed a two-stage method—ManipTrans—that efficiently transfers human hand manipulation skills to robotic dexterous hands in a simulation environment.

Can Robots Squeeze Toothpaste? ManipTrans: Efficient Transfer of Human Hand Skills to Dexterous Hands

  • Paper Address: MANIPTRANS: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning

  • Paper Link: https://arxiv.org/pdf/2503.21860

  • Project Homepage: https://maniptrans.github.io

  • Code and Dataset: https://github.com/ManipTrans/ManipTrans

ManipTrans first utilizes a pre-trained model of a general trajectory imitator to mimic human hand movements; then, for different operational skills, it introduces a residual learning module combined with physics-based interaction constraints for fine-tuning (as shown in Figure 1). This method separates action imitation from physical constraints, making the learning of complex bimanual tasks more efficient and precise.

Based on ManipTrans, the research team also released a large-scale dexterous hand operation dataset DexManipNet, covering previously unexplored tasks such as capping a pen and twisting a bottle cap.

Can Robots Squeeze Toothpaste? ManipTrans: Efficient Transfer of Human Hand Skills to Dexterous Hands

Figure1. Cross-model dexterous hand skill transfer based onManipTrans to achieve the same operational skills

Research Background

Human hands play a crucial role in interacting with the environment, which has sparked extensive research into robotic dexterous manipulation. How to quickly acquire large-scale, precise, and human-level dexterous hand operation data has become an urgent problem to solve.

Existing reinforcement learning-based methods require carefully designed reward functions tailored to specific tasks, which often limits the complexity of tasks and may lead to unnatural robot movements; another class of teleoperation-based methods is costly, inefficient, and the collected data is usually specific to particular entities, lacking generality.

Currently, a promising solution is to use imitation learning to transfer human operational actions to dexterous hands in a simulation environment to generate natural “hand-object interactions.” However, achieving precise and efficient transfer is not easy. Due to the morphological differences between human hands and robotic hands, direct pose redirection does not yield ideal results. Moreover, although the data obtained from motion capture is relatively accurate, the accumulation of errors in high-precision tasks can still lead to task failure. Additionally, bimanual manipulation introduces a high-dimensional action space, significantly increasing the difficulty of efficient policy learning, thus most previous work typically stops at single-hand grasping tasks.

Research Method

Can Robots Squeeze Toothpaste? ManipTrans: Efficient Transfer of Human Hand Skills to Dexterous Hands

Figure2. The proposedManipTrans method framework diagram

To address the aforementioned challenges, this paper proposes a concise and effective method—ManipTrans (as shown in Figure 2), aimed at transferring operational skills, especially bimanual collaborative skills, from human hands to robotic dexterous hands in a simulation environment. The core idea is to divide the transfer process into two stages: the first stage achieves trajectory imitation of hand movements; the second stage fine-tunes the actions under the premise of satisfying physical interaction constraints.

Specifically, a general model is first pre-trained to accurately mimic human finger movements; based on this, a residual learning module is introduced to fine-tune the actions of the dexterous hands, focusing on the following two points: 1) ensuring stable contact between the fingers and the object’s surface; 2) coordinating both hands to ensure high precision and fidelity in complex situations.

This paper models the problem as an implicit Markov decision process (MDP), employing the PPO algorithm in both stages to maximize discounted returns. In the first stage, a reward function is designed to constrain the dexterous hands to follow the reference human hand trajectory while ensuring the stability and smoothness of the actions. Among them, the finger imitation reward function “encourages” the key point positions of the dexterous hands to align with those of the human hand, particularly the fingertip positions of the thumb, index, and middle fingers, which frequently contact the object, effectively addressing the issue of morphological inconsistency.

In the second stage, the residual module outputs compensation terms for the actions, achieving fine-tuning by adding to the actions from the first stage. This module additionally considers the following information: 1) the centroid position of the object and the gravity acting on it to enhance torque perception; 2) the shape of the object represented by a spatial basis point set (BPS); 3) the spatial relationship between the key points of the dexterous hands and the object; 4) the fingertip contact forces provided by the simulation environment. The second stage particularly adds a contact force reward function to encourage more stable hand-object contact. During training, random reference state initialization and curriculum learning strategies are introduced to improve convergence speed and training stability.

In summary, the design of ManipTrans in the first stage alleviates the morphological differences between human hands and dexterous hands, while in the second stage it captures subtle interactive actions. By decoupling finger imitation from physical interaction constraints, the complexity of the action space is significantly reduced, while training efficiency is enhanced. This paper validates the effectiveness and efficiency of the method across a series of complex single-hand and bimanual operation tasks, including operations involving hinged objects. To assess the generalization capability of the method, cross-entity experiments were conducted, confirming that ManipTrans can be applied to dexterous hands with different degrees of freedom and morphologies without additional parameter tuning. Furthermore, the bimanual operation data obtained based on ManipTrans has also been validated in real machine deployments.

DexManipNet Dataset

Can Robots Squeeze Toothpaste? ManipTrans: Efficient Transfer of Human Hand Skills to Dexterous Hands

Figure3. Dexterous hand whiteboard writing

Can Robots Squeeze Toothpaste? ManipTrans: Efficient Transfer of Human Hand Skills to Dexterous Hands

Figure4. Bimanual object scooping

Based on the ManipTrans method, this study transferred two large “hand-object interaction” datasets (OakInk V2 and FAVOR) to dexterous hands, constructing the DexManipNet dataset. This dataset covers 61 challenging tasks, including 3,300 dexterous hand operation sequences for over 1,200 objects, totaling approximately 1.34 million frames of data. Among them, about 600 sequences involve complex bimanual operation tasks (as shown in Figures 3 and 4), fully demonstrating the capabilities of robots in high-difficulty operational scenarios.

Can Robots Squeeze Toothpaste? ManipTrans: Efficient Transfer of Human Hand Skills to Dexterous Hands

Figure5. Dexterous hand unscrewing a toothpaste cap

Can Robots Squeeze Toothpaste? ManipTrans: Efficient Transfer of Human Hand Skills to Dexterous Hands

Figure6. Bimanual operation of pouring into a test tube

Additionally, researchers replayed the data trajectories from DexManipNet on a real machine platform, using two 7-degree-of-freedom robotic arms and a pair of dexterous hands. The deployment results demonstrated previously unattainable fine dexterous operation capabilities. For example, in the task of “unscrewing a toothpaste cap,” the left hand securely held the toothpaste tube while the right hand’s thumb and index finger deftly unscrewed the small toothpaste cap. These subtle and complex actions are often difficult to capture accurately through teleoperation (as shown in Figures 5 and 6).

Experimental Results

Can Robots Squeeze Toothpaste? ManipTrans: Efficient Transfer of Human Hand Skills to Dexterous Hands

Table1. Quantitative comparison of ManipTrans with baseline methods

This paper compares ManipTrans with two major categories of existing methods—reinforcement learning-based methods and optimization-based methods. The results show that ManipTrans outperforms baseline methods across all metrics, demonstrating high precision in single-hand and bimanual operation tasks (as shown in Table 1). Qualitative and quantitative analyses confirm that the two-stage transfer framework of ManipTrans effectively captures subtle finger movements and interactions with objects, improving task success rates and the realism of movements.

Can Robots Squeeze Toothpaste? ManipTrans: Efficient Transfer of Human Hand Skills to Dexterous Hands

Figure7. Cross-entity transfer experiments

Can Robots Squeeze Toothpaste? ManipTrans: Efficient Transfer of Human Hand Skills to Dexterous HandsFigure8. Bimanual operation of hinged objects

Furthermore, the research demonstrated the scalability of ManipTrans on different models of dexterous hands. The framework relies solely on the correspondence between human fingers and the key points of dexterous hands, adapting to different morphologies and degrees of freedom without excessive parameter tuning (as shown in Figure 7). The article also validated the method on the hinged object operation dataset ARCTIC. By fine-tuning the reward function and adding a reward for the angle of motion of hinged objects, the dexterous hands successfully performed specified angle rotation operations on hinged objects (as shown in Figure 8), showcasing the potential of the ManipTrans method in complex operational tasks.

© THE END

For reprints, please contact this public account for authorization

Submissions or inquiries: [email protected]

Leave a Comment