CVDec 6, 2022Code
GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point CloudsHonghui Yang, Tong He, Jiaheng Liu et al.
Despite the tremendous progress of Masked Autoencoders (MAE) in developing vision tasks such as image and video, exploring MAE in large-scale 3D point clouds remains challenging due to the inherent irregularity. In contrast to previous 3D MAE frameworks, which either design a complex decoder to infer masked information from maintained regions or adopt sophisticated masking strategies, we instead propose a much simpler paradigm. The core idea is to apply a \textbf{G}enerative \textbf{D}ecoder for MAE (GD-MAE) to automatically merges the surrounding context to restore the masked geometric knowledge in a hierarchical fusion manner. In doing so, our approach is free from introducing the heuristic design of decoders and enjoys the flexibility of exploring various masking strategies. The corresponding part costs less than \textbf{12\%} latency compared with conventional methods, while achieving better performance. We demonstrate the efficacy of the proposed method on several large-scale benchmarks: Waymo, KITTI, and ONCE. Consistent improvement on downstream detection tasks illustrates strong robustness and generalization capability. Not only our method reveals state-of-the-art results, but remarkably, we achieve comparable accuracy even with \textbf{20\%} of the labeled data on the Waymo dataset. Code will be released at https://github.com/Nightmare-n/GD-MAE.
CVNov 14, 2022Code
Boosting Semi-Supervised 3D Object Detection with Semi-SamplingXiaopei Wu, Yang Zhao, Liang Peng et al.
Current 3D object detection methods heavily rely on an enormous amount of annotations. Semi-supervised learning can be used to alleviate this issue. Previous semi-supervised 3D object detection methods directly follow the practice of fully-supervised methods to augment labeled and unlabeled data, which is sub-optimal. In this paper, we design a data augmentation method for semi-supervised learning, which we call Semi-Sampling. Specifically, we use ground truth labels and pseudo labels to crop gt samples and pseudo samples on labeled frames and unlabeled frames, respectively. Then we can generate a gt sample database and a pseudo sample database. When training a teacher-student semi-supervised framework, we randomly select gt samples and pseudo samples to both labeled frames and unlabeled frames, making a strong data augmentation for them. Our semi-sampling can be regarded as an extension of gt-sampling to semi-supervised learning. Our method is simple but effective. We consistently improve state-of-the-art methods on ScanNet, SUN-RGBD, and KITTI benchmarks by large margins. For example, when training using only 10% labeled data on ScanNet, we achieve 3.1 mAP and 6.4 mAP improvement upon 3DIoUMatch in terms of mAP@0.25 and mAP@0.5. When training using only 1% labeled data on KITTI, we boost 3DIoUMatch by 3.5 mAP, 6.7 mAP and 14.1 mAP on car, pedestrian and cyclist classes. Codes will be made publicly available at https://github.com/LittlePey/Semi-Sampling.
84.1LGMay 25
CktGen: Automated Analog Circuit Design with Generative Artificial IntelligenceYuxuan Hou, Hehe Fan, Jianrong Zhang et al.
The automatic synthesis of analog circuits presents significant challenges. Most existing approaches formulate the problem as a single-objective optimization task, overlooking that design specifications for a given circuit type vary widely across applications. To address this, we introduce specification-conditioned analog circuit generation, a task that directly generates analog circuits based on target specifications. The motivation is to leverage existing well-designed circuits to improve automation in analog circuit design. Specifically, we propose CktGen, a simple yet effective variational autoencoder that maps discretized specifications and circuits into a joint latent space and reconstructs the circuit from that latent vector. Notably, as a single specification may correspond to multiple valid circuits, naively fusing specification information into the generative model does not capture these one-to-many relationships. To address this, we decouple the encoding of circuits and specifications and align their mapped latent space. Then, we employ contrastive training with a filter mask to maximize differences between encoded circuits and specifications. Furthermore, classifier guidance along with latent feature alignment promotes the clustering of circuits sharing the same specification, avoiding model collapse into trivial one-to-one mappings. By canonicalizing the latent space with respect to specifications, we can search for an optimal circuit that meets valid target specifications. We conduct comprehensive experiments on the open circuit benchmark and introduce metrics to evaluate cross-model consistency. Experimental results demonstrate that CktGen achieves substantial improvements over state-of-the-art methods.
CVMar 28, 2023
HS-Pose: Hybrid Scope Feature Extraction for Category-level Object Pose EstimationLinfang Zheng, Chen Wang, Yinghan Sun et al.
In this paper, we focus on the problem of category-level object pose estimation, which is challenging due to the large intra-category shape variation. 3D graph convolution (3D-GC) based methods have been widely used to extract local geometric features, but they have limitations for complex shaped objects and are sensitive to noise. Moreover, the scale and translation invariant properties of 3D-GC restrict the perception of an object's size and translation information. In this paper, we propose a simple network structure, the HS-layer, which extends 3D-GC to extract hybrid scope latent features from point cloud data for category-level object pose estimation tasks. The proposed HS-layer: 1) is able to perceive local-global geometric structure and global information, 2) is robust to noise, and 3) can encode size and translation information. Our experiments show that the simple replacement of the 3D-GC layer with the proposed HS-layer on the baseline method (GPV-Pose) achieves a significant improvement, with the performance increased by 14.5% on 5d2cm metric and 10.3% on IoU75. Our method outperforms the state-of-the-art methods by a large margin (8.3% on 5d2cm, 6.9% on IoU75) on the REAL275 dataset and runs in real-time (50 FPS).
90.3ROMay 22
Any2Any: Efficient Cross-Embodiment Transfer for Humanoid Whole-Body TrackingMing Yang, Tao Yu, Feng Li et al.
Whole-body tracking (WBT) models have become a key foundation for humanoid robots, enabling them to imitate diverse motions with high fidelity. Training such models from scratch requires large-scale data and computation, making rapid deployment on new humanoid platforms costly. This raises a natural question: Can pretrained WBT models transfer across embodiments with minimal adaptation? To answer this question, we propose Any2Any, a paradigm that efficiently transfers an existing WBT specialist to a new humanoid embodiment with only a small amount of data and compute. Any2Any first performs kinematic alignment between source and target humanoids, aligning their input and output spaces so that the pretrained source policy can be meaningfully reused on the target embodiment.Any2Any then performs dynamics adaptation by applying lightweight parameter-efficient fine-tuning (PEFT) components to selected dynamics-sensitive modules, preserving useful behavioral priors while enabling targeted adaptation to the target robot. Extensive experiments on multiple humanoid platforms and pretrained backbones show that Any2Any substantially accelerates convergence and reduces training cost compared with training from scratch, while achieving competitive or superior tracking performance. Notably, using only 1% of the compute and data required for full training, Any2Any successfully transfers Sonic models pre-trained on Unitree G1 to LimX Oli and LimX Luna. These results suggest that pretrained WBT specialists can be efficiently reused across embodiments, providing a scalable path toward deploying humanoid whole-body control on new robots.
76.5ROMay 21
Beyond Pixels: Learning Invariant Rewards for Real-World Robotics From a Few DemonstrationsTengye Xu, Yangting Sun, Ziju Shen et al.
Designing reward functions that generalize beyond controlled laboratory settings remains a fundamental challenge in reinforcement learning for robotics. In open-world manipulation problems, a single task can appear in numerous variants through different object instances, positions, and camera viewpoints. Recent vision-based reward models tend to memorize specific pixel distributions and fail to generalize beyond their training conditions. To address this, we propose a framework that learns invariant symbolic reward functions from as few as five demonstrations. The insight is to shift from visual feature-fitting to the discovery of behavioral invariants: task-level properties that remain constant across diverse visual instantiations. The framework has two coupled components: a structural reward formulation that encodes task-level strategies and physical constraints while preserving optimal policy invariance, and a hybrid symbolic-numerical procedure that distills these invariants from demonstrations without online interaction. Experiments on eight Meta-World tasks and three Franka manipulation tasks demonstrate that our method achieves stronger process alignment and policy rollout ranking abilities compared to baselines, accelerating downstream policy learning. Three real-world out-of-distribution experiments further show that the same learned reward generalizes zero-shot to position, viewpoint, and object variations, enabling a single reward representation to be reused across diverse task variants in practice.
CVNov 21, 2023
Multi-Resolution Planar Region Extraction for Uneven TerrainsYinghan Sun, Linfang Zheng, Hua Chen et al.
This paper studies the problem of extracting planar regions in uneven terrains from unordered point cloud measurements. Such a problem is critical in various robotic applications such as robotic perceptive locomotion. While existing approaches have shown promising results in effectively extracting planar regions from the environment, they often suffer from issues such as low computational efficiency or loss of resolution. To address these issues, we propose a multi-resolution planar region extraction strategy in this paper that balances the accuracy in boundaries and computational efficiency. Our method begins with a pointwise classification preprocessing module, which categorizes all sampled points according to their local geometric properties to facilitate multi-resolution segmentation. Subsequently, we arrange the categorized points using an octree, followed by an in-depth analysis of nodes to finish multi-resolution plane segmentation. The efficiency and robustness of the proposed approach are verified via synthetic and real-world experiments, demonstrating our method's ability to generalize effectively across various uneven terrains while maintaining real-time performance, achieving frame rates exceeding 35 FPS.
76.8CVMay 19
TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core OptimizationChonghao Zhong, Linfeng Shi, Hua Chen et al.
Training 3D Gaussian Splatting (3DGS) at billion-primitive scale is fundamentally memory-bound: each Gaussian primitive carries a large attribute vector, and the aggregate parameter table quickly exceeds GPU capacity, limiting prior systems to tens of millions of Gaussians on commodity single-GPU hardware. We observe that 3DGS training is inherently sparse and trajectory-conditioned: each iteration activates only the Gaussians visible from the current camera batch, so GPU memory can serve as a working-set cache rather than a persistent parameter store. Building on this insight, we introduce TideGS, an out-of-core training framework that manages parameters across an SSD-CPU-GPU hierarchy via three synergistic techniques: block-virtualized geometry for SSD-aligned spatial locality, a hierarchical asynchronous pipeline to overlap I/O with computation, and trajectory-adaptive differential streaming that transfers only incremental working-set deltas between iterations. Experiments show that TideGS enables training with over one billion Gaussians on a single 24 GB GPU while achieving the best reconstruction quality among evaluated single-GPU baselines on large-scale scenes, scaling beyond prior out-of-core baselines (e.g., approximately 100M Gaussians) and standard in-memory training (e.g., approximately 11M Gaussians).
RODec 15, 2025
PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive RepresentationsMingqi Yuan, Tao Yu, Haolin Song et al.
Achieving efficient and robust whole-body control (WBC) is essential for enabling humanoid robots to perform complex tasks in dynamic environments. Despite the success of reinforcement learning (RL) in this domain, its sample inefficiency remains a significant challenge due to the intricate dynamics and partial observability of humanoid robots. To address this limitation, we propose PvP, a Proprioceptive-Privileged contrastive learning framework that leverages the intrinsic complementarity between proprioceptive and privileged states. PvP learns compact and task-relevant latent representations without requiring hand-crafted data augmentations, enabling faster and more stable policy learning. To support systematic evaluation, we develop SRL4Humanoid, the first unified and modular framework that provides high-quality implementations of representative state representation learning (SRL) methods for humanoid robot learning. Extensive experiments on the LimX Oli robot across velocity tracking and motion imitation tasks demonstrate that PvP significantly improves sample efficiency and final performance compared to baseline SRL methods. Our study further provides practical insights into integrating SRL with RL for humanoid WBC, offering valuable guidance for data-efficient humanoid robot learning.
ROFeb 24
BFA++: Hierarchical Best-Feature-Aware Token Prune for Multi-View Vision Language Action ModelHaosheng Li, Weixin Mao, Zihan Lan et al.
Vision-Language-Action (VLA) models have achieved significant breakthroughs by leveraging Large Vision Language Models (VLMs) to jointly interpret instructions and visual inputs. However, the substantial increase in visual tokens, particularly from multi-view inputs, poses serious challenges to real-time robotic manipulation. Existing acceleration techniques for VLMs, such as token pruning, often result in degraded performance when directly applied to VLA models, as they overlook the relationships between different views and fail to account for the dynamic and task-specific characteristics of robotic operation. To address this, we propose BFA++, a dynamic token pruning framework designed specifically for VLA models. BFA++ introduces a hierarchical pruning strategy guided by two-level importance predictors: an intra-view predictor highlights task-relevant regions within each image to suppress spatial noise, while an inter-view predictor identifies critical camera views throughout different manipulation phases to reduce cross-view redundancy. This design enables efficient token selection while preserving essential visual cues, resulting in improved computational efficiency and higher manipulation success rates. Evaluations on the RoboTwin benchmark and real-world robotic tasks demonstrate that BFA++ consistently outperforms existing methods. BFA++ improves the success rate by about 10% on both the π0 and RDT models, achieving speedup of 1.8X and 1.5X, respectively. Our results highlight that context-sensitive and task-aware token pruning serves as a more effective strategy than full visual processing, enabling faster inference and improved manipulation accuracy in real-world robotic systems.
90.6ROApr 3
ARM: Advantage Reward Modeling for Long-Horizon ManipulationYiming Mao, Zixi Yu, Weixin Mao et al.
Long-horizon robotic manipulation remains challenging for reinforcement learning (RL) because sparse rewards provide limited guidance for credit assignment. Practical policy improvement thus relies on richer intermediate supervision, such as dense progress rewards, which are costly to obtain and ill-suited to non-monotonic behaviors such as backtracking and recovery. To address this, we propose Advantage Reward Modeling (ARM), a framework that shifts from hard-to-quantify absolute progress to estimating relative advantage. We introduce a cost-effective tri-state labeling strategy -- Progressive, Regressive, and Stagnant -- that reduces human cognitive overhead while ensuring high cross-annotator consistency. By training on these intuitive signals, ARM enables automated progress annotation for both complete demonstrations and fragmented DAgger-style data. Integrating ARM into an offline RL pipeline allows for adaptive action-reward reweighting, effectively filtering suboptimal samples. Our approach achieves a 99.4% success rate on a challenging long-horizon towel-folding task, demonstrating improved stability and data efficiency over current VLA baselines with near-zero human intervention during policy training.
LGAug 21, 2024
Valuing an Engagement Surface using a Large Scale Dynamic Causal ModelAbhimanyu Mukerji, Sushant More, Ashwin Viswanathan Kannan et al.
With recent rapid growth in online shopping, AI-powered Engagement Surfaces (ES) have become ubiquitous across retail services. These engagement surfaces perform an increasing range of functions, including recommending new products for purchase, reminding customers of their orders and providing delivery notifications. Understanding the causal effect of engagement surfaces on value driven for customers and businesses remains an open scientific question. In this paper, we develop a dynamic causal model at scale to disentangle value attributable to an ES, and to assess its effectiveness. We demonstrate the application of this model to inform business decision-making by understanding returns on investment in the ES, and identifying product lines and features where the ES adds the most value.
IVMar 25, 2024Code
RSTAR4D: Rotational Streak Artifact Reduction in 4D CBCT using a Separable 4D CNNZiheng Deng, Hua Chen, Yongzheng Zhou et al.
Four-dimensional cone-beam computed tomography (4D CBCT) provides respiration-resolved images and can be used for image-guided radiation therapy. However, the ability to reveal respiratory motion comes at the cost of image artifacts. As raw projection data are sorted into multiple respiratory phases, the cone-beam projections become much sparser and the reconstructed 4D CBCT images will be covered by severe streak artifacts. Although several deep learning-based methods have been proposed to address this issue, most algorithms employ 2D network models as backbones, neglecting the intrinsic structural priors within 4D CBCT images. In this paper, we first explore the origin and appearance of streak artifacts in 4D CBCT images. We find that streak artifacts exhibit a unique rotational motion along with the patient's respiration, distinguishable from diaphragm-driven respiratory motion in the spatiotemporal domain. Therefore, we propose a novel 4D neural network model, RSTAR4D-Net, designed to address Rotational STreak Artifact Reduction by integrating the spatial and temporal information within 4D CBCT images. Specifically, we overcome the computational and training difficulties of a 4D neural network. The specially designed model adopts an efficient implementation of 4D convolutions to reduce computational costs and thus can process the whole 4D image in one pass. Additionally, a Tetris training strategy pertinent to the separable 4D convolutions is proposed to effectively train the model using limited 4D training samples. Extensive experiments substantiate the effectiveness of our proposed method, and the RSTAR4D-Net shows superior performance compared to other methods. The source code and dynamic demos are available at https://github.com/ivy9092111111/RSTAR.
CVMay 11, 2023Code
PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel TransformerHonghui Yang, Wenxiao Wang, Minghao Chen et al.
Recent Transformer-based 3D object detectors learn point cloud features either from point- or voxel-based representations. However, the former requires time-consuming sampling while the latter introduces quantization errors. In this paper, we present a novel Point-Voxel Transformer for single-stage 3D detection (PVT-SSD) that takes advantage of these two representations. Specifically, we first use voxel-based sparse convolutions for efficient feature encoding. Then, we propose a Point-Voxel Transformer (PVT) module that obtains long-range contexts in a cheap manner from voxels while attaining accurate positions from points. The key to associating the two different representations is our introduced input-dependent Query Initialization module, which could efficiently generate reference points and content queries. Then, PVT adaptively fuses long-range contextual and local geometric information around reference points into content queries. Further, to quickly find the neighboring points of reference points, we design the Virtual Range Image module, which generalizes the native range image to multi-sensor and multi-frame. The experiments on several autonomous driving benchmarks verify the effectiveness and efficiency of the proposed method. Code will be available at https://github.com/Nightmare-n/PVT-SSD.
72.0ROApr 17
Long-Term Memory for VLA-based Agents in Open-World Task ExecutionXu Huang, Weixin Mao, Yinhao Li et al.
Vision-Language-Action (VLA) models have demonstrated significant potential for embodied decision-making; however, their application in complex chemical laboratory automation remains restricted by limited long-horizon reasoning and the absence of persistent experience accumulation. Existing frameworks typically treat planning and execution as decoupled processes, often failing to consolidate successful strategies, which results in inefficient trial-and-error in multi-stage protocols. In this paper, we propose ChemBot, a dual-layer, closed-loop framework that integrates an autonomous AI agent with a progress-aware VLA model (Skill-VLA) for hierarchical task decomposition and execution. ChemBot utilizes a dual-layer memory architecture to consolidate successful trajectories into retrievable assets, while a Model Context Protocol (MCP) server facilitates efficient sub-agent and tool orchestration. To address the inherent limitations of VLA models, we further implement a future-state-based asynchronous inference mechanism to mitigate trajectory discontinuities. Extensive experiments on collaborative robots demonstrate that ChemBot achieves superior operational safety, precision, and task success rates compared to existing VLA baselines in complex, long-horizon chemical experimentation.
36.0ROMar 28
Where-to-Learn: Analytical Policy Gradient Directed Exploration for On-Policy Robotic Reinforcement LearningLeixin Chang, Xinchen Yao, Ben Liu et al.
On-policy reinforcement learning (RL) algorithms have demonstrated great potential in robotic control, where effective exploration is crucial for efficient and high-quality policy learning. However, how to encourage the agent to explore the better trajectories efficiently remains a challenge. Most existing methods incentivize exploration by maximizing the policy entropy or encouraging novel state visiting regardless of the potential state value. We propose a new form of directed exploration that uses analytical policy gradients from a differentiable dynamics model to inject task-aware, physics-guided guidance, thereby steering the agent towards high-reward regions for accelerated and more effective policy learning.
LGJan 5
RainBalance: Alleviating Dual Imbalance in GNSS-based Precipitation Nowcasting via Continuous Probability ModelingYifang Zhang, Shengwu Xiong, Henan Wang et al.
Global navigation satellite systems (GNSS) station-based Precipitation Nowcasting aims to predict rainfall within the next 0-6 hours by leveraging a GNSS station's historical observations of precipitation, GNSS-PWV, and related meteorological variables, which is crucial for disaster mitigation and real-time decision-making. In recent years, time-series forecasting approaches have been extensively applied to GNSS station-based precipitation nowcasting. However, the highly imbalanced temporal distribution of precipitation, marked not only by the dominance of non-rainfall events but also by the scarcity of extreme precipitation samples, significantly limits model performance in practical applications. To address the dual imbalance problem in precipitation nowcasting, we propose a continuous probability modeling-based framework, RainBalance. This plug-and-play module performs clustering for each input sample to obtain its cluster probability distribution, which is further mapped into a continuous latent space via a variational autoencoder (VAE). By learning in this continuous probabilistic space, the task is reformulated from fitting single and imbalance-prone precipitation labels to modeling continuous probabilistic label distributions, thereby alleviating the imbalance issue. We integrate this module into multiple state-of-the-art models and observe consistent performance gains. Comprehensive statistical analysis and ablation studies further validate the effectiveness of our approach.
IRFeb 11
Compute Only Once: UG-Separation for Efficient Large Recommendation ModelsHui Lu, Zheng Chai, Shipeng Bai et al.
Driven by scaling laws, recommender systems increasingly rely on large-scale models to capture complex feature interactions and user behaviors, but this trend also leads to prohibitive training and inference costs. While long-sequence models(e.g., LONGER) can reuse user-side computation through KV caching, such reuse is difficult in dense feature interaction architectures(e.g., RankMixer), where user and group (candidate item) features are deeply entangled across layers. In this work, we propose User-Group Separation (UG-Sep), a novel framework that enables reusable user-side computation in dense interaction models for the first time. UG-Sep introduces a masking mechanism that explicitly disentangles user-side and item-side information flows within token-mixing layers, ensuring that a subset of tokens to preserve purely user-side representations across layers. This design enables corresponding token computations to be reused across multiple samples, significantly reducing redundant inference cost. To compensate for potential expressiveness loss induced by masking, we further propose an Information Compensation strategy that adaptively reconstructs suppressed user-item interactions. Moreover, as UG-Sep substantially reduces user-side FLOPs and exposes memory-bound components, we incorporate W8A16 (8-bit weight, 16-bit activation) weight-only quantization to alleviate memory bandwidth bottlenecks and achieve additional acceleration. We conduct extensive offline evaluations and large-scale online A/B experiments at ByteDance, demonstrating that UG-Sep reduces inference latency by up to 20 percent without degrading online user experience or commercial metrics across multiple business scenarios, including feed recommendation and advertising systems.
RODec 4, 2024
Learning Whole-Body Loco-Manipulation for Omni-Directional Task Space Pose Tracking with a Wheeled-Quadrupedal-ManipulatorKaiwen Jiang, Zhen Fu, Junde Guo et al.
In this paper, we study the whole-body loco-manipulation problem using reinforcement learning (RL). Specifically, we focus on the problem of how to coordinate the floating base and the robotic arm of a wheeled-quadrupedal manipulator robot to achieve direct six-dimensional (6D) end-effector (EE) pose tracking in task space. Different from conventional whole-body loco-manipulation problems that track both floating-base and end-effector commands, the direct EE pose tracking problem requires inherent balance among redundant degrees of freedom in the whole-body motion. We leverage RL to solve this challenging problem. To address the associated difficulties, we develop a novel reward fusion module (RFM) that systematically integrates reward terms corresponding to different tasks in a nonlinear manner. In such a way, the inherent multi-stage and hierarchical feature of the loco-manipulation problem can be carefully accommodated. By combining the proposed RFM with the a teacher-student RL training paradigm, we present a complete RL scheme to achieve 6D EE pose tracking for the wheeled-quadruped manipulator robot. Extensive simulation and hardware experiments demonstrate the significance of the RFM. In particular, we enable smooth and precise tracking performance, achieving state-of-the-art tracking position error of less than 5 cm, and rotation error of less than 0.1 rad. Please refer to https://clearlab-sustech.github.io/RFM_loco_mani/ for more experimental videos.
CVApr 17, 2024
GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose RefinementLinfang Zheng, Tze Ho Elden Tse, Chen Wang et al.
Object pose refinement is essential for robust object pose estimation. Previous work has made significant progress towards instance-level object pose refinement. Yet, category-level pose refinement is a more challenging problem due to large shape variations within a category and the discrepancies between the target object and the shape prior. To address these challenges, we introduce a novel architecture for category-level object pose refinement. Our approach integrates an HS-layer and learnable affine transformations, which aims to enhance the extraction and alignment of geometric information. Additionally, we introduce a cross-cloud transformation mechanism that efficiently merges diverse data sources. Finally, we push the limits of our model by incorporating the shape prior information for translation and size error prediction. We conducted extensive experiments to demonstrate the effectiveness of the proposed framework. Through extensive quantitative experiments, we demonstrate significant improvement over the baseline method by a large margin across all metrics.
LGMar 5
Diffusion Policy through Conditional Proximal Policy OptimizationBen Liu, Shunpeng Yang, Hua Chen
Reinforcement learning (RL) has been extensively employed in a wide range of decision-making problems, such as games and robotics. Recently, diffusion policies have shown strong potential in modeling multi-modal behaviors, enabling more diverse and flexible action generation compared to the conventional Gaussian policy. Despite various attempts to combine RL with diffusion, a key challenge is the difficulty of computing action log-likelihood under the diffusion model. This greatly hinders the direct application of diffusion policies in on-policy reinforcement learning. Most existing methods calculate or approximate the log-likelihood through the entire denoising process in the diffusion model, which can be memory- and computationally inefficient. To overcome this challenge, we propose a novel and efficient method to train a diffusion policy in an on-policy setting that requires only evaluating a simple Gaussian probability. This is achieved by aligning the policy iteration with the diffusion process, which is a distinct paradigm compared to previous work. Moreover, our formulation can naturally handle entropy regularization, which is often difficult to incorporate into diffusion policies. Experiments demonstrate that the proposed method produces multimodal policy behaviors and achieves superior performance on a variety of benchmark tasks in both IsaacLab and MuJoCo Playground.
LGFeb 1
PolicyFlow: Policy Optimization with Continuous Normalizing Flow in Reinforcement LearningShunpeng Yang, Ben Liu, Hua Chen
Among on-policy reinforcement learning algorithms, Proximal Policy Optimization (PPO) demonstrates is widely favored for its simplicity, numerical stability, and strong empirical performance. Standard PPO relies on surrogate objectives defined via importance ratios, which require evaluating policy likelihood that is typically straightforward when the policy is modeled as a Gaussian distribution. However, extending PPO to more expressive, high-capacity policy models such as continuous normalizing flows (CNFs), also known as flow-matching models, is challenging because likelihood evaluation along the full flow trajectory is computationally expensive and often numerically unstable. To resolve this issue, we propose PolicyFlow, a novel on-policy CNF-based reinforcement learning algorithm that integrates expressive CNF policies with PPO-style objectives without requiring likelihood evaluation along the full flow path. PolicyFlow approximates importance ratios using velocity field variations along a simple interpolation path, reducing computational overhead without compromising training stability. To further prevent mode collapse and further encourage diverse behaviors, we propose the Brownian Regularizer, an implicit policy entropy regularizer inspired by Brownian motion, which is conceptually elegant and computationally lightweight. Experiments on diverse tasks across various environments including MultiGoal, PointMaze, IsaacLab and MuJoCo Playground show that PolicyFlow achieves competitive or superior performance compared to PPO using Gaussian policies and flow-based baselines including FPO and DPPO. Notably, results on MultiGoal highlight PolicyFlow's ability to capture richer multimodal action distributions.
LGSep 28, 2025
How Effective Are Time-Series Models for Precipitation Nowcasting? A Comprehensive Benchmark for GNSS-based Precipitation NowcastingYifang Zhang, Shengwu Xiong, Henan Wang et al.
Precipitation Nowcasting, which aims to predict precipitation within the next 0 to 6 hours, is critical for disaster mitigation and real-time response planning. However, most time series forecasting benchmarks in meteorology are evaluated on variables with strong periodicity, such as temperature and humidity, which fail to reflect model capabilities in more complex and practically meteorology scenarios like precipitation nowcasting. To address this gap, we propose RainfallBench, a benchmark designed for precipitation nowcasting, a highly challenging and practically relevant task characterized by zero inflation, temporal decay, and non-stationarity, focusing on predicting precipitation within the next 0 to 6 hours. The dataset is derived from five years of meteorological observations, recorded at hourly intervals across six essential variables, and collected from more than 140 Global Navigation Satellite System (GNSS) stations globally. In particular, it incorporates precipitable water vapor (PWV), a crucial indicator of rainfall that is absent in other datasets. We further design specialized evaluation protocols to assess model performance on key meteorological challenges, including multi-scale prediction, multi-resolution forecasting, and extreme rainfall events, benchmarking 17 state-of-the-art models across six major architectures on RainfallBench. Additionally, to address the zero-inflation and temporal decay issues overlooked by existing models, we introduce Bi-Focus Precipitation Forecaster (BFPF), a plug-and-play module that incorporates domain-specific priors to enhance rainfall time series forecasting. Statistical analysis and ablation studies validate the comprehensiveness of our dataset as well as the superiority of our methodology.
LGMay 13, 2025
A Practical Introduction to Deep Reinforcement LearningYinghan Sun, Hongxi Wang, Hua Chen et al.
Deep reinforcement learning (DRL) has emerged as a powerful framework for solving sequential decision-making problems, achieving remarkable success in a wide range of applications, including game AI, autonomous driving, biomedicine, and large language models. However, the diversity of algorithms and the complexity of theoretical foundations often pose significant challenges for beginners seeking to enter the field. This tutorial aims to provide a concise, intuitive, and practical introduction to DRL, with a particular focus on the Proximal Policy Optimization (PPO) algorithm, which is one of the most widely used and effective DRL methods. To facilitate learning, we organize all algorithms under the Generalized Policy Iteration (GPI) framework, offering readers a unified and systematic perspective. Instead of lengthy theoretical proofs, we emphasize intuitive explanations, illustrative examples, and practical engineering techniques. This work serves as an efficient and accessible guide, helping readers rapidly progress from basic concepts to the implementation of advanced DRL algorithms.
CVJan 3, 2025
MRG: A Multi-Robot Manufacturing Digital Scene Generation Method Using Multi-Instance Point Cloud RegistrationSongjie Han, Yinhua Liu, Yanzheng Li et al.
A high-fidelity digital simulation environment is crucial for accurately replicating physical operational processes. However, inconsistencies between simulation and physical environments result in low confidence in simulation outcomes, limiting their effectiveness in guiding real-world production. Unlike the traditional step-by-step point cloud "segmentation-registration" generation method, this paper introduces, for the first time, a novel Multi-Robot Manufacturing Digital Scene Generation (MRG) method that leverages multi-instance point cloud registration, specifically within manufacturing scenes. Tailored to the characteristics of industrial robots and manufacturing settings, an instance-focused transformer module is developed to delineate instance boundaries and capture correlations between local regions. Additionally, a hypothesis generation module is proposed to extract target instances while preserving key features. Finally, an efficient screening and optimization algorithm is designed to refine the final registration results. Experimental evaluations on the Scan2CAD and Welding-Station datasets demonstrate that: (1) the proposed method outperforms existing multi-instance point cloud registration techniques; (2) compared to state-of-the-art methods, the Scan2CAD dataset achieves improvements in MR and MP by 12.15% and 17.79%, respectively; and (3) on the Welding-Station dataset, MR and MP are enhanced by 16.95% and 24.15%, respectively. This work marks the first application of multi-instance point cloud registration in manufacturing scenes, significantly advancing the precision and reliability of digital simulation environments for industrial applications.
ROJan 28, 2022
Quadruped Capturability and Push Recovery via a Switched-Systems Characterization of Dynamic BalanceHua Chen, Zejun Hong, Shunpeng Yang et al.
This paper studies capturability and push recovery for quadrupedal locomotion. Despite the rich literature on capturability analysis and push recovery control for legged robots, existing tools are developed mainly for bipeds or humanoids. Distinct quadrupedal features such as point contacts and multiple swinging legs prevent direct application of these methods. To address this gap, we propose a switched systems model for quadruped dynamics, and instantiate the abstract viability concept for quadrupedal locomotion with a time-based gait. Capturability is characterized through a novel specification of dynamically balanced states that addresses the time-varying nature of quadrupedal locomotion and balance. A linear inverted pendulum (LIP) model is adopted to demonstrate the theory and show how the newly developed quadrupedal capturability can be used in motion planning for quadrupedal push recovery. We formulate and solve an explicit model predictive control (EMPC) problem whose optimal solution fully characterizes quadrupedal capturability with the LIP. Given this analysis, an optimization-based planning scheme is devised for determining footsteps and center of mass references during push recovery. To validate the effectiveness of the overall framework, we conduct numerous simulation and hardware experiments. Simulation results illustrate the necessity of considering dynamic balance for quadrupedal capturability, and the significant improvement in disturbance rejection with the proposed strategy. Experimental validations on a replica of the Mini Cheetah quadruped demonstrate an up to 100% improvement as compared with state-of-the-art.
MAAug 17, 2021
Encirclement Guaranteed Cooperative Pursuit with Robust Model Predictive ControlChen Wang, Hua Chen, Jia Pan et al.
This paper studies a novel encirclement guaranteed cooperative pursuit problem involving $N$ pursuers and a single evader in an unbounded two-dimensional game domain. Throughout the game, the pursuers are required to maintain encirclement of the evader, i.e., the evader should always stay inside the convex hull generated by all the pursuers, in addition to achieving the classical capture condition. To tackle this challenging cooperative pursuit problem, a robust model predictive control (RMPC) based formulation framework is first introduced, which simultaneously accounts for the encirclement and capture requirements under the assumption that the evader's action is unavailable to all pursuers. Despite the reformulation, the resulting RMPC problem involves a bilinear constraint due to the encirclement requirement. To further handle such a bilinear constraint, a novel encirclement guaranteed partitioning scheme is devised that simplifies the original bilinear RMPC problem to a number of linear tube MPC (TMPC) problems solvable in a decentralized manner. Simulation experiments demonstrate the effectiveness of the proposed solution framework. Furthermore, comparisons with existing approaches show that the explicit consideration of the encirclement condition significantly improves the chance of successful capture of the evader in various scenarios.
ROAug 15, 2021
Force-feedback based Whole-body Stabilizer for Position-Controlled Humanoid RobotsShunpeng Yang, Hua Chen, Zhen Fu et al.
This paper studies stabilizer design for position-controlled humanoid robots. Stabilizers are an essential part for position-controlled humanoids, whose primary objective is to adjust the control input sent to the robot to assist the tracking controller to better follow the planned reference trajectory. To achieve this goal, this paper develops a novel force-feedback based whole-body stabilizer that fully exploits the six-dimensional force measurement information and the whole-body dynamics to improve tracking performance. Relying on rigorous analysis of whole-body dynamics of position-controlled humanoids under unknown contact, the developed stabilizer leverages quadratic-programming based technique that allows cooperative consideration of both the center-of-mass tracking and contact force tracking. The effectiveness of the proposed stabilizer is demonstrated on the UBTECH Walker robot in the MuJoCo simulator. Simulation validations show a significant improvement in various scenarios as compared to commonly adopted stabilizers based on the zero-moment-point feedback and the linear inverted pendulum model.
ROJun 28, 2021
Instantaneous Capture Input for Balancing the Variable Height Inverted PendulumJunwei Liu, Hua Chen, Patrick M. Wensing et al.
Balancing is a fundamental need for legged robots due to their unstable floating-base nature. Balance control has been thoroughly studied for simple models such as the linear inverted pendulum thanks to the concept of the instantaneous capture point (ICP), yet the constant center of mass height assumption limits the application. This paper explores balancing of the variable-height inverted pendulum (VHIP) model by introducing the \emph{instantaneous capture input} (ICI), an extension of the ICP based on its key properties. Namely, the ICI can be computed as a function of the state, and when this function is used as the control policy, the ICI is rendered stationary and the system will eventually come to a stop. This characterization induces an analytical region of capturable states for the VHIP, which can be used to conceptually guide where to step. To further address state and control constraints during recovery, we present and theoretically analyze an explicit ICI-based controller with online optimal feedback gains. Simulations demonstrate the validity of our controller for capturability maintenance compared to an approach based on the divergent component of motion.
RODec 11, 2020
Underactuated Motion Planning and Control for Jumping with Wheeled-Bipedal RobotsHua Chen, Bingheng Wang, Zejun Hong et al.
This paper studies jumping for wheeled-bipedal robots, a motion that takes full advantage of the benefits from the hybrid wheeled and legged design features. A comprehensive hierarchical scheme for motion planning and control of jumping with wheeled-bipedal robots is developed. Underactuation of the wheeled-bipedal dynamics is the main difficulty to be addressed, especially in the planning problem. To tackle this issue, a novel wheeled-spring-loaded inverted pendulum (W-SLIP) model is proposed to characterize the essential dynamics of wheeled-bipedal robots during jumping. Relying on a differential-flatness-like property of the W-SLIP model, a tractable quadratic programming based solution is devised for planning jumping motions for wheeled-bipedal robots. Combined with a kinematic planning scheme accounting for the flight phase motion, a complete planning scheme for the W-SLIP model is developed. To enable accurate tracking of the planned trajectories, a linear quadratic regulator based wheel controller and a task-space whole-body controller for the other joints are blended through disturbance observers. The overall planning and control scheme is validated using V-REP simulations of a prototype wheeled-bipedal robot.
RONov 17, 2019
Optimal Control of a Differentially Flat 2D Spring-Loaded Inverted Pendulum ModelHua Chen, Patrick M. Wensing, Wei Zhang
This paper considers the optimal control problem of an extended spring-loaded inverted pendulum (SLIP) model with two additional actuators for active leg length and hip torque modulation. These additional features arise naturally in practice, allowing for consideration of swing leg kinematics during flight and active control over stance dynamics. On the other hand, nonlinearity and the hybrid nature of the overall SLIP dynamics introduce challenges in the analysis and control of the model. In this paper, we first show that the stance dynamics of the considered SLIP model are differentially flat, which has a strong implication regarding controllability of the stance dynamics. Leveraging this powerful property, a tractable optimal control strategy is developed. This strategy enables online solution while also treating the hybrid nature of the SLIP dynamics. Together with the optimal control strategy, the extended SLIP model grants active disturbance rejection capability at any point during the gait. Performance of the proposed control strategy is demonstrated via numerical tests and shows significant advantage over existing methods.
SYSep 20, 2015
Fractional-order Generalized Principle of Self-Support (FOG PSS) in Control Systems DesignHua Chen, YangQuan Chen
This paper reviews research that studies the principle of self-support (PSS) in some control systems and proposes a fractional-order generalized PSS framework for the first time. The existing PSS approach focuses on practical tracking problem of integer-order systems including robotic dynamics, high precision linear motor system, multi-axis high precision positioning system with unmeasurable variables, imprecise sensor information, uncertain parameters and external disturbances. More generally, by formulating the fractional PSS concept as a new generalized framework, we will focus in the possible fields on the fractional-order control problems such as practical tracking, $λ$-tracking, etc. of robot systems, multiple mobile agents, discrete dynamical systems, time delay systems and other uncertain nonlinear systems. Finally, the practical tracking of a first-order uncertain model of automobile is considered as a simple example to demonstrate the efficiency of the fractional-order generalized principle of self-support (FOGPSS) control strategy.
QUANT-PHSep 4, 2014
Field and long-term demonstration of a wide area quantum key distribution networkShuang Wang, Wei Chen, Zhen-Qiang Yin et al.
A wide area quantum key distribution (QKD) network deployed on communication infrastructures provided by China Mobile Ltd. is demonstrated. Three cities and two metropolitan area QKD networks were linked up to form the Hefei-Chaohu-Wuhu wide area QKD network with over 150 kilometers coverage area, in which Hefei metropolitan area QKD network was a typical full-mesh core network to offer all-to-all interconnections, and Wuhu metropolitan area QKD network was a representative quantum access network with point-to-multipoint configuration. The whole wide area QKD network ran for more than 5000 hours, from 21 December 2011 to 19 July 2012, and part of the network stopped until last December. To adapt to the complex and volatile field environment, the Faraday-Michelson QKD system with several stability measures was adopted when we designed QKD devices. Through standardized design of QKD devices, resolution of symmetry problem of QKD devices, and seamless switching in dynamic QKD network, we realized the effective integration between point-to-point QKD techniques and networking schemes.