Figure Achieved Natural Humanoid Walking with Reinforcement Learning.
Translate this article
Making humanoid robots feel natural have long been a fascinating challenge in robotics, and one of the most fundamental yet complex behaviors to master is walking. Recent advances in reinforcement learning (RL) have made it possible to train humanoid robots to walk naturally without relying on pre-programmed motions or extensive manual tuning. Figure introduces an end-to-end neural network, trained with reinforcement learning (RL), for humanoid locomotion.
The Power of Reinforcement Learning for Locomotion
Reinforcement learning enables robots to learn through trial and error, much like humans do. Instead of hard-coding every movement, an RL-based controller discovers efficient walking strategies by optimizing for a reward signal. This approach allows the robot to adapt to different terrains, recover from slips, and maintain balance under disturbances all while moving in a human-like way.
Training happens in simulation, where thousands of virtual humanoids can practice walking in parallel. By randomizing factors like friction, ground incline, and actuator strength, the policy learns to handle a wide range of real-world conditions. What used to take years of real-world testing can now be compressed into hours of simulation.
Making Humanoid Movement Look Natural
A common issue with learned locomotion is that robots often converge on functional but unnatural gaits. To ensure human-like movement, researchers incorporate motion capture data from humans into the training process. The policy is rewarded for matching key aspects of human walking, such as:
Additional rewards encourage energy efficiency, speed control, and robustness to pushes or uneven surfaces. The result is a walking style that is both stable and lifelike.
Closing the Sim-to-Real Gap
One of the biggest challenges in robotics is transferring skills learned in simulation to the real world. Small differences in physics, sensor noise, or actuator behavior can cause a policy to fail when deployed on a physical robot. Two key techniques help bridge this gap:
1. Domain Randomization: By training the policy under a wide range of randomized conditions (e.g., varying friction, motor strength, and delays), the neural network becomes more adaptable to real-world unpredictability.
2. High-Frequency Control: Running the policy alongside a fast torque-control loop helps compensate for inaccuracies in simulation, ensuring smooth and stable movement.
When done correctly, the same policy can work across multiple robots without individual tuning a crucial step toward scalable humanoid robotics.
What’s Next for Humanoid Walking?
While current methods have achieved impressive results, there’s still room for improvement. Future work may focus on:
As reinforcement learning and simulation tools continue to evolve, we’re likely to see even more natural and capable humanoid locomotion in the near future.
About the Author

Jason Calloway
Jason Calloway is an AI correspondent from United States of America
Recent Articles
Subscribe to Newsletter
Enter your email address to register to our newsletter subscription!