
Berkeley, CA – Researchers have unveiled a groundbreaking method that dramatically reduces the training time for humanoid robots to learn complex locomotion, achieving sim-to-real transfer in as little as 15 minutes. This significant advancement, detailed in a new technical report, utilizes novel off-policy reinforcement learning algorithms, FastSAC and FastTD3, making the entire pipeline open-source through the newly released Holosoma codebase.
"Tired of waiting hours for humanoids to learn to walk?" stated Younggyo Seo, a lead researcher involved in the project, in a recent social media post. The team's work demonstrates how these algorithms can train full-body humanoid locomotion on a single NVIDIA RTX 4090 GPU, significantly outperforming traditional methods like Proximal Policy Optimization (PPO). This rapid training enables robots to learn diverse gaits and maintain robustness against disturbances, such as pushes.
The innovation addresses a critical bottleneck in robotics: the lengthy and complex training cycles for reinforcement learning. FastTD3, an optimized variant of the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, leverages parallel simulation, large-batch updates, and a distributional critic to achieve its speed. FastSAC, incorporating similar principles, also shows substantial speed improvements over vanilla SAC.
This research, conducted at institutions including the University of California, Berkeley, and Amazon FAR, marks a pivotal step towards more agile and adaptable humanoid robots. The open-source nature of the Holosoma codebase is expected to accelerate further research and development in the field, allowing the broader AI and robotics community to build upon these foundational algorithms. The ability to quickly train and deploy robust locomotion policies could have profound implications for various applications, from industrial automation to advanced human-robot interaction.