Yuman Gao

RO
h-index54
5papers
179citations
Novelty51%
AI Score37

5 Papers

ROMay 20, 2025
Toward Real-World Cooperative and Competitive Soccer with Quadrupedal Robot Teams

Zhi Su, Yuman Gao, Emily Lukas et al. · bytedance

Achieving coordinated teamwork among legged robots requires both fine-grained locomotion control and long-horizon strategic decision-making. Robot soccer offers a compelling testbed for this challenge, combining dynamic, competitive, and multi-agent interactions. In this work, we present a hierarchical multi-agent reinforcement learning (MARL) framework that enables fully autonomous and decentralized quadruped robot soccer. First, a set of highly dynamic low-level skills is trained for legged locomotion and ball manipulation, such as walking, dribbling, and kicking. On top of these, a high-level strategic planning policy is trained with Multi-Agent Proximal Policy Optimization (MAPPO) via Fictitious Self-Play (FSP). This learning framework allows agents to adapt to diverse opponent strategies and gives rise to sophisticated team behaviors, including coordinated passing, interception, and dynamic role allocation. With an extensive ablation study, the proposed learning method shows significant advantages in the cooperative and competitive multi-agent soccer game. We deploy the learned policies to real quadruped robots relying solely on onboard proprioception and decentralized localization, with the resulting system supporting autonomous robot-robot and robot-human soccer matches on indoor and outdoor soccer courts.

ROOct 13, 2025
Ego-Vision World Model for Humanoid Contact Planning

Hang Liu, Yuman Gao, Sangli Teng et al.

Enabling humanoid robots to exploit physical contact, rather than simply avoid collisions, is crucial for autonomy in unstructured environments. Traditional optimization-based planners struggle with contact complexity, while on-policy reinforcement learning (RL) is sample-inefficient and has limited multi-task ability. We propose a framework combining a learned world model with sampling-based Model Predictive Control (MPC), trained on a demonstration-free offline dataset to predict future outcomes in a compressed latent space. To address sparse contact rewards and sensor noise, the MPC uses a learned surrogate value function for dense, robust planning. Our single, scalable model supports contact-aware tasks, including wall support after perturbation, blocking incoming objects, and traversing height-limited arches, with improved data efficiency and multi-task capability over on-policy RL. Deployed on a physical humanoid, our system achieves robust, real-time contact planning from proprioception and ego-centric depth images. Website: https://ego-vcp.github.io/

ROSep 16, 2021
Meeting-Merging-Mission: A Multi-robot Coordinate Framework for Large-Scale Communication-Limited Exploration

Yuman Gao, Yingjian Wang, Xingguang Zhong et al.

This letter presents a complete framework Meeting-Merging-Mission for multi-robot exploration under communication restriction. Considering communication is limited in both bandwidth and range in the real world, we propose a lightweight environment presentation method and an efficient cooperative exploration strategy. For lower bandwidth, each robot utilizes specific polytopes to maintains free space and super frontier information (SFI) as the source for exploration decision-making. To reduce repeated exploration, we develop a mission-based protocol that drives robots to share collected information in stable rendezvous. We also design a complete path planning scheme for both centralized and decentralized cases. To validate that our framework is practical and generic, we present an extensive benchmark and deploy our system into multi-UGV and multi-UAV platforms.

ROMar 11, 2021
Visibility-aware Trajectory Optimization with Application to Aerial Tracking

Qianhao Wang, Yuman Gao, Jialin Ji et al.

The visibility of targets determines performance and even success rate of various applications, such as active slam, exploration, and target tracking. Therefore, it is crucial to take the visibility of targets into explicit account in trajectory planning. In this paper, we propose a general metric for target visibility, considering observation distance and angle as well as occlusion effect. We formulate this metric into a differentiable visibility cost function, with which spatial trajectory and yaw can be jointly optimized. Furthermore, this visibility-aware trajectory optimization handles dynamic feasibility of position and yaw simultaneously. To validate that our method is practical and generic, we integrate it into a customized quadrotor tracking system. The experimental results show that our visibility-aware planner performs more robustly and observes targets better. In order to benefit related researches, we release our code to the public.

RONov 8, 2020
Learning-based 3D Occupancy Prediction for Autonomous Navigation in Occluded Environments

Lizi Wang, Hongkai Ye, Qianhao Wang et al.

In autonomous navigation of mobile robots, sensors suffer from massive occlusion in cluttered environments, leaving significant amount of space unknown during planning. In practice, treating the unknown space in optimistic or pessimistic ways both set limitations on planning performance, thus aggressiveness and safety cannot be satisfied at the same time. However, humans can infer the exact shape of the obstacles from only partial observation and generate non-conservative trajectories that avoid possible collisions in occluded space. Mimicking human behavior, in this paper, we propose a method based on deep neural network to predict occupancy distribution of unknown space reliably. Specifically, the proposed method utilizes contextual information of environments and learns from prior knowledge to predict obstacle distributions in occluded space. We use unlabeled and no-ground-truth data to train our network and successfully apply it to real-time navigation in unseen environments without any refinement. Results show that our method leverages the performance of a kinodynamic planner by improving security with no reduction of speed in clustered environments.