68.8LGMar 26Code
Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement LearningJiajun Hu, Nuria Armengol Urpi, Jin Cheng et al.
Zero-shot reinforcement learning (RL) algorithms aim to learn a family of policies from a reward-free dataset, and recover optimal policies for any reward function directly at test time. Naturally, the quality of the pretraining dataset determines the performance of the recovered policies across tasks. However, pre-collecting a relevant, diverse dataset without prior knowledge of the downstream tasks of interest remains a challenge. In this work, we study $\textit{online}$ zero-shot RL for quadrupedal control on real robotic systems, building upon the Forward-Backward (FB) algorithm. We observe that undirected exploration yields low-diversity data, leading to poor downstream performance and rendering policies impractical for direct hardware deployment. Therefore, we introduce FB-MEBE, an online zero-shot RL algorithm that combines an unsupervised behavior exploration strategy with a regularization critic. FB-MEBE promotes exploration by maximizing the entropy of the achieved behavior distribution. Additionally, a regularization critic shapes the recovered policies toward more natural and physically plausible behaviors. We empirically demonstrate that FB-MEBE achieves and improved performance compared to other exploration strategies in a range of simulated downstream tasks, and that it renders natural policies that can be seamlessly deployed to hardware without further finetuning. Videos and code available on our website.
ROJul 16, 2024
RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse RewardsFatemeh Zargarbashi, Jin Cheng, Dongho Kang et al.
This paper presents a novel learning-based control framework that uses keyframing to incorporate high-level objectives in natural locomotion for legged robots. These high-level objectives are specified as a variable number of partial or complete pose targets that are spaced arbitrarily in time. Our proposed framework utilizes a multi-critic reinforcement learning algorithm to effectively handle the mixture of dense and sparse rewards. Additionally, it employs a transformer-based encoder to accommodate a variable number of input targets, each associated with specific time-to-arrivals. Throughout simulation and hardware experiments, we demonstrate that our framework can effectively satisfy the target keyframe sequence at the required times. In the experiments, the multi-critic method significantly reduces the effort of hyperparameter tuning compared to the standard single-critic alternative. Moreover, the proposed transformer-based architecture enables robots to anticipate future goals, which results in quantitative improvements in their ability to reach their targets.
ROOct 3, 2023
Learning Diverse Skills for Local Navigation under Multi-constraint OptimalityJin Cheng, Marin Vlastelica, Pavel Kolev et al.
Despite many successful applications of data-driven control in robotics, extracting meaningful diverse behaviors remains a challenge. Typically, task performance needs to be compromised in order to achieve diversity. In many scenarios, task requirements are specified as a multitude of reward terms, each requiring a different trade-off. In this work, we take a constrained optimization viewpoint on the quality-diversity trade-off and show that we can obtain diverse policies while imposing constraints on their value functions which are defined through distinct rewards. In line with previous work, further control of the diversity level can be achieved through an attract-repel reward term motivated by the Van der Waals force. We demonstrate the effectiveness of our method on a local navigation task where a quadruped robot needs to reach the target within a finite horizon. Finally, our trained policies transfer well to the real 12-DoF quadruped robot, Solo12, and exhibit diverse agile behaviors with successful obstacle traversal.
NAOct 20, 2018
Tikhonov regularization with l^0-term complementing a convex penalty: l^1 convergence under sparsity constraintsWei Wang, Shuai Lu, Bernd Hofmann et al.
Measuring the error by an l^1-norm, we analyze under sparsity assumptions an l^0-regularization approach, where the penalty in the Tikhonov functional is complemented by a general stabilizing convex functional. In this context, ill-posed operator equations Ax = y with an injective and bounded linear operator A mapping between l^2 and a Banach space Y are regularized. For sparse solutions, error estimates as well as linear and sublinear convergence rates are derived based on a variational inequality approach, where the regularization parameter can be chosen either a priori in an appropriate way or a posteriori by the sequential discrepancy principle. To further illustrate the balance between the l^0-term and the complementing convex penalty, the important special case of the l^2-norm square penalty is investigated showing explicit dependence between both terms. Finally, some numerical experiments verify and illustrate the sparsity promoting properties of corresponding regularized solutions.
LGJul 21, 2023
Offline Diversity Maximization Under Imitation ConstraintsMarin Vlastelica, Jin Cheng, Georg Martius et al.
There has been significant recent progress in the area of unsupervised skill discovery, utilizing various information-theoretic objectives as measures of diversity. Despite these advances, challenges remain: current methods require significant online interaction, fail to leverage vast amounts of available task-agnostic data and typically lack a quantitative measure of skill utility. We address these challenges by proposing a principled offline algorithm for unsupervised skill discovery that, in addition to maximizing diversity, ensures that each learned skill imitates state-only expert demonstrations to a certain degree. Our main analytical contribution is to connect Fenchel duality, reinforcement learning, and unsupervised skill discovery to maximize a mutual information objective subject to KL-divergence state occupancy constraints. Furthermore, we demonstrate the effectiveness of our method on the standard offline benchmark D4RL and on a custom offline dataset collected from a 12-DoF quadruped robot for which the policies trained in simulation transfer well to the real robotic system.
93.7NEApr 14
Adaptive Spiking Neurons for Vision and Language ModelingChenlin Zhou, Sihang Guo, Jiaqi Wang et al.
Regarded as the third generation of neural networks, Spiking Neural Networks (SNNs) have garnered significant traction due to their biological plausibility and energy efficiency. Recent advancements in large models necessitate spiking neurons capable of high performance, adaptability, and training efficiency. In this work, we first propose a novel functional perspective that provides general guidance for designing the new generation of spiking neurons. Following the insightful guidelines, we propose the Adaptive Spiking Neuron (ASN), which incorporates trainable parameters to learn membrane potential dynamics and enable adaptive firing. ASN adopts an integer training and spike inference paradigm, facilitating efficient SNN training. To further enhance robustness, we propose a specialized variant of ASN, the Normalized Adaptive Spiking Neuron (NASN), which integrates normalization to stabilize training. We evaluate our neuron model on 19 datasets spanning five distinct tasks in both vision and language modalities, demonstrating the effectiveness and versatility of the ASN family. Our ASN family is expected to become the new generation of general-purpose spiking neurons.
DBNov 10, 2025
Trading Vector Data in Vector DatabasesJin Cheng, Xiangxiang Dai, Ningning Ding et al.
Vector data trading is essential for cross-domain learning with vector databases, yet it remains largely unexplored. We study this problem under online learning, where sellers face uncertain retrieval costs and buyers provide stochastic feedback to posted prices. Three main challenges arise: (1) heterogeneous and partial feedback in configuration learning, (2) variable and complex feedback in pricing learning, and (3) inherent coupling between configuration and pricing decisions. We propose a hierarchical bandit framework that jointly optimizes retrieval configurations and pricing. Stage I employs contextual clustering with confidence-based exploration to learn effective configurations with logarithmic regret. Stage II adopts interval-based price selection with local Taylor approximation to estimate buyer responses and achieve sublinear regret. We establish theoretical guarantees with polynomial time complexity and validate the framework on four real-world datasets, demonstrating consistent improvements in cumulative reward and regret reduction compared with existing methods.
54.1NAMar 15
A Robust Learning-Based Method for the Helmholtz Equation in Dissipative Media and Complex DomainsLifu Song, Tingyue Li, Jin Cheng
To mitigate pollution effects in high-frequency Helmholtz problems, Learning-based Numerical Methods (LbNM) reconstruct solution operators using complete systems of exact solutions. However, the previously used fundamental-solution (FS) basis suffers from instability in dissipative media and requires sensitive geometric tuning. In this paper, we propose a robust alternative using a Bessel basis (BB). From a learning theory perspective, the BB forms a complete hypothesis space of standing waves, ensuring immunity to dissipation-induced signal loss. We establish a convergence result that depends on intrinsic regularity. Numerical experiments demonstrate that the proposed method achieves machine-precision accuracy in dissipative regimes where FS fails, significantly outperforms the Finite Element Method (FEM) in efficiency, and demonstrates the framework's geometric extensibility via a multi-center strategy.
ROFeb 18, 2025
SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal LearningPeizhuo Li, Hongyi Li, Ge Sun et al.
Despite recent advances in learning-based controllers for legged robots, deployments in human-centric environments remain limited by safety concerns. Most of these approaches use position-based control, where policies output target joint angles that must be processed by a low-level controller (e.g., PD or impedance controllers) to compute joint torques. Although impressive results have been achieved in controlled real-world scenarios, these methods often struggle with compliance and adaptability when encountering environments or disturbances unseen during training, potentially resulting in extreme or unsafe behaviors. Inspired by how animals achieve smooth and adaptive movements by controlling muscle extension and contraction, torque-based policies offer a promising alternative by enabling precise and direct control of the actuators in torque space. In principle, this approach facilitates more effective interactions with the environment, resulting in safer and more adaptable behaviors. However, challenges such as a highly nonlinear state space and inefficient exploration during training have hindered their broader adoption. To address these limitations, we propose SATA, a bio-inspired framework that mimics key biomechanical principles and adaptive learning mechanisms observed in animal locomotion. Our approach effectively addresses the inherent challenges of learning torque-based policies by significantly improving early-stage exploration, leading to high-performance final policies. Remarkably, our method achieves zero-shot sim-to-real transfer. Our experimental results indicate that SATA demonstrates remarkable compliance and safety, even in challenging environments such as soft/slippery terrain or narrow passages, and under significant external disturbances, highlighting its potential for practical deployments in human-centric and safety-critical scenarios.
AIJan 23, 2024
Quantitative Analysis of Molecular Transport in the Extracellular Space Using Physics-Informed Neural NetworkJiayi Xie, Hongfeng Li, Jin Cheng et al.
The brain extracellular space (ECS), an irregular, extremely tortuous nanoscale space located between cells or between cells and blood vessels, is crucial for nerve cell survival. It plays a pivotal role in high-level brain functions such as memory, emotion, and sensation. However, the specific form of molecular transport within the ECS remain elusive. To address this challenge, this paper proposes a novel approach to quantitatively analyze the molecular transport within the ECS by solving an inverse problem derived from the advection-diffusion equation (ADE) using a physics-informed neural network (PINN). PINN provides a streamlined solution to the ADE without the need for intricate mathematical formulations or grid settings. Additionally, the optimization of PINN facilitates the automatic computation of the diffusion coefficient governing long-term molecule transport and the velocity of molecules driven by advection. Consequently, the proposed method allows for the quantitative analysis and identification of the specific pattern of molecular transport within the ECS through the calculation of the Peclet number. Experimental validation on two datasets of magnetic resonance images (MRIs) captured at different time points showcases the effectiveness of the proposed method. Notably, our simulations reveal identical molecular transport patterns between datasets representing rats with tracer injected into the same brain region. These findings highlight the potential of PINN as a promising tool for comprehensively exploring molecular transport within the ECS.
ROFeb 2, 2025
CAIMAN: Causal Action Influence Detection for Sample-efficient Loco-manipulationYuanchen Yuan, Jin Cheng, Núria Armengol Urpí et al.
Enabling legged robots to perform non-prehensile loco-manipulation is crucial for enhancing their versatility. Learning behaviors such as whole-body object pushing often requires sophisticated planning strategies or extensive task-specific reward shaping, especially in unstructured environments. In this work, we present CAIMAN, a practical reinforcement learning framework that encourages the agent to gain control over other entities in the environment. CAIMAN leverages causal action influence as an intrinsic motivation objective, allowing legged robots to efficiently acquire object pushing skills even under sparse task rewards. We employ a hierarchical control strategy, combining a low-level locomotion module with a high-level policy that generates task-relevant velocity commands and is trained to maximize the intrinsic reward. To estimate causal action influence, we learn the dynamics of the environment by integrating a kinematic prior with data collected during training.We empirically demonstrate CAIMAN's superior sample efficiency and adaptability to diverse scenarios in simulation, as well as its successful transfer to real-world systems without further fine-tuning.
RONov 22, 2025
Switch-JustDance: Benchmarking Whole Body Motion Tracking Policies Using a Commercial Console GameJeonghwan Kim, Wontaek Kim, Yidan Lu et al.
Recent advances in whole-body robot control have enabled humanoid and legged robots to perform increasingly agile and coordinated motions. However, standardized benchmarks for evaluating these capabilities in real-world settings, and in direct comparison to humans, remain scarce. Existing evaluations often rely on pre-collected human motion datasets or simulation-based experiments, which limit reproducibility, overlook hardware factors, and hinder fair human-robot comparisons. We present Switch-JustDance, a low-cost and reproducible benchmarking pipeline that leverages motion-sensing console games, Just Dance on the Nintendo Switch, to evaluate robot whole-body control. Using Just Dance on the Nintendo Switch as a representative platform, Switch-JustDance converts in-game choreography into robot-executable motions through streaming, motion reconstruction, and motion retargeting modules and enables users to evaluate controller performance through the game's built-in scoring system. We first validate the evaluation properties of Just Dance, analyzing its reliability, validity, sensitivity, and potential sources of bias. Our results show that the platform provides consistent and interpretable performance measures, making it a suitable tool for benchmarking embodied AI. Building on this foundation, we benchmark three state-of-the-art humanoid whole-body controllers on hardware and provide insights into their relative strengths and limitations.
ROOct 27, 2025
TARC: Time-Adaptive Robotic ControlArnav Sukhija, Lenart Treven, Jin Cheng et al.
Fixed-frequency control in robotics imposes a trade-off between the efficiency of low-frequency control and the robustness of high-frequency control, a limitation not seen in adaptable biological systems. We address this with a reinforcement learning approach in which policies jointly select control actions and their application durations, enabling robots to autonomously modulate their control frequency in response to situational demands. We validate our method with zero-shot sim-to-real experiments on two distinct hardware platforms: a high-speed RC car and a quadrupedal robot. Our method matches or outperforms fixed-frequency baselines in terms of rewards while significantly reducing the control frequency and exhibiting adaptive frequency control under real-world conditions.
ROMay 29, 2023
RL + Model-based Control: Using On-demand Optimal Control to Learn Versatile Legged LocomotionDongho Kang, Jin Cheng, Miguel Zamora et al.
This paper presents a control framework that combines model-based optimal control and reinforcement learning (RL) to achieve versatile and robust legged locomotion. Our approach enhances the RL training process by incorporating on-demand reference motions generated through finite-horizon optimal control, covering a broad range of velocities and gaits. These reference motions serve as targets for the RL policy to imitate, leading to the development of robust control policies that can be learned with reliability. Furthermore, by utilizing realistic simulation data that captures whole-body dynamics, RL effectively overcomes the inherent limitations in reference motions imposed by modeling simplifications. We validate the robustness and controllability of the RL training process within our framework through a series of experiments. In these experiments, our method showcases its capability to generalize reference motions and effectively handle more complex locomotion tasks that may pose challenges for the simplified model, thanks to RL's flexibility. Additionally, our framework effortlessly supports the training of control policies for robots with diverse dimensions, eliminating the necessity for robot-specific adjustments in the reward function and hyperparameters.
NAAug 19, 2015
New Epitaxial Thin Film Models and numerical approximationWenbin Chen, Zhenhua Chen, Jin Cheng et al.
This paper concerns new continuum phenomenological model for epitaxial thin film growth with three different forms of the Ehrlich-Schwoebel current. Two of these forms were first proposed by Politi and Villain [1996] and then studied by Evans, Thiel and Bartelt [2006]. The other one is completely new. Following the techniques used in Li and Liu [2003], we present rigorous analysis of the well-posedness, regularity and time stability for the new model. We also studied both the global and the local behavior of the surface roughness in the growth process. The new model differs from other known models in that it features a linear convex part and a nonlinear concave part, and thus by using a convex-concave time splitting scheme, one can naturally build unconditionally stable semi-implicit numerical discretizations with linear implicit parts, which is much easier to implement than conventional models requiring nonlinear implicit parts. Despite this fundamental difference in the model, numerical experiments show that the nonlinear morphological instability of the new model agrees well with results of other models published in Li and Liu [2003], which indicates that the new model correctly captures the essential morphological states in the thin film growth process.