68.5CVMay 16
EgoKit: Towards Unified Low-Cost Egocentric Data Collection with Heterogeneous DevicesLiuchuan Yu, Erdem Murat, Beichen Wang et al.
Egocentric video is increasingly used as a data source for robot learning, activity understanding, and embodied AI research, but collecting it at scale remains fragmented in practice: each candidate host device, such as an Android phone, iPhone, iPad, smart glasses, or extended reality (XR) headset, exposes a different SDK, a different policy on raw camera access, and different limitations on external USB cameras and on-device tracking. Synchronized ego-view and wrist-view capture is therefore typically obtained by either committing to a single proprietary platform or building one-off rigs that do not transfer across devices. To address this gap, we present EgoKit, a toolkit that exposes the same egocentric recording workflow across six heterogeneous host devices. Across all supported devices, EgoKit presents the same recording interaction and produces locally stored video with a uniform log format; on XR headsets, it additionally logs head pose and OpenXR-standard 26-joint hand tracking aligned to the video streams. The companion accessories, including two wrist cameras with mounts, a head strap, and a USB-C hub, add wrist-view capture to any supported host without custom hardware fabrication. EgoKit is available at \url{https://egokit.chuange.org/}.
ROApr 3, 2024
A Survey of Optimization-based Task and Motion Planning: From Classical To Learning ApproachesZhigen Zhao, Shuo Cheng, Yan Ding et al.
Task and Motion Planning (TAMP) integrates high-level task planning and low-level motion planning to equip robots with the autonomy to effectively reason over long-horizon, dynamic tasks. Optimization-based TAMP focuses on hybrid optimization approaches that define goal conditions via objective functions and are capable of handling open-ended goals, robotic dynamics, and physical interaction between the robot and the environment. Therefore, optimization-based TAMP is particularly suited to solve highly complex, contact-rich locomotion and manipulation problems. This survey provides a comprehensive review on optimization-based TAMP, covering (i) planning domain representations, including action description languages and temporal logic, (ii) individual solution strategies for components of TAMP, including AI planning and trajectory optimization (TO), and (iii) the dynamic interplay between logic-based task planning and model-based TO. A particular focus of this survey is to highlight the algorithm structures to efficiently solve TAMP, especially hierarchical and distributed approaches. Additionally, the survey emphasizes the synergy between the classical methods and contemporary learning-based innovations such as large language models. Furthermore, the future research directions for TAMP is discussed in this survey, highlighting both algorithmic and application-specific challenges.
ROJul 31, 2025
XRoboToolkit: A Cross-Platform Framework for Robot TeleoperationZhigen Zhao, Liuchuan Yu, Ke Jing et al.
The rapid advancement of Vision-Language-Action models has created an urgent need for large-scale, high-quality robot demonstration datasets. Although teleoperation is the predominant method for data collection, current approaches suffer from limited scalability, complex setup procedures, and suboptimal data quality. This paper presents XRoboToolkit, a cross-platform framework for extended reality based robot teleoperation built on the OpenXR standard. The system features low-latency stereoscopic visual feedback, optimization-based inverse kinematics, and support for diverse tracking modalities including head, controller, hand, and auxiliary motion trackers. XRoboToolkit's modular architecture enables seamless integration across robotic platforms and simulation environments, spanning precision manipulators, mobile robots, and dexterous hands. We demonstrate the framework's effectiveness through precision manipulation tasks and validate data quality by training VLA models that exhibit robust autonomous performance.
ROSep 16, 2021
Adversarially Regularized Policy Learning Guided by Trajectory OptimizationZhigen Zhao, Simiao Zuo, Tuo Zhao et al.
Recent advancement in combining trajectory optimization with function approximation (especially neural networks) shows promise in learning complex control policies for diverse tasks in robot systems. Despite their great flexibility, the large neural networks for parameterizing control policies impose significant challenges. The learned neural control policies are often overcomplex and non-smooth, which can easily cause unexpected or diverging robot motions. Therefore, they often yield poor generalization performance in practice. To address this issue, we propose adVErsarially Regularized pOlicy learNIng guided by trajeCtory optimizAtion (VERONICA) for learning smooth control policies. Specifically, our proposed approach controls the smoothness (local Lipschitz continuity) of the neural control policies by stabilizing the output control with respect to the worst-case perturbation to the input state. Our experiments on robot manipulation show that our proposed approach not only improves the sample efficiency of neural policy learning but also enhances the robustness of the policy against various types of disturbances, including sensor noise, environmental uncertainty, and model mismatch.
MEMar 1, 2021
BEAUTY Powered BEASTKai Zhang, Wan Zhang, Zhigen Zhao et al.
We study distribution-free goodness-of-fit tests with the proposed Binary Expansion Approximation of UniformiTY (BEAUTY) approach. This method generalizes the renowned Euler's formula, and approximates the characteristic function of any copula through a linear combination of expectations of binary interactions from marginal binary expansions. This novel theory enables a unification of many important tests of independence via approximations from specific quadratic forms of symmetry statistics, where the deterministic weight matrix characterizes the power properties of each test. To achieve a robust power, we examine test statistics with data-adaptive weights, referred to as the Binary Expansion Adaptive Symmetry Test (BEAST). For any given alternative, we demonstrate that the Neyman-Pearson test can be approximated by an oracle weighted sum of symmetry statistics. The BEAST with this oracle provides a useful benchmark of feasible power. To approach this oracle power, we devise the BEAST through a regularized resampling approximation of the oracle test. The BEAST improves the empirical power of many existing tests against a wide spectrum of common alternatives and delivers a clear interpretation of dependency forms when significant.
ROOct 21, 2020
SyDeBO: Symbolic-Decision-Embedded Bilevel Optimization for Long-Horizon Manipulation in Dynamic EnvironmentsZhigen Zhao, Ziyi Zhou, Michael Park et al.
This study proposes a Task and Motion Planning (TAMP) method with symbolic decisions embedded in a bilevel optimization. This TAMP method exploits the discrete structure of sequential manipulation for long-horizon and versatile tasks in dynamically changing environments. At the symbolic planning level, we propose a scalable decision-making method for long-horizon manipulation tasks using the Planning Domain Definition Language (PDDL) with causal graph decomposition. At the motion planning level, we devise a trajectory optimization (TO) approach based on the Alternating Direction Method of Multipliers (ADMM), suitable for solving constrained, large-scale nonlinear optimization in a distributed manner. Distinct from conventional geometric motion planners, our approach generates highly dynamic manipulation motions by incorporating the full robot and object dynamics. Furthermore, in lieu of a hierarchical planning approach, we solve a holistically integrated bilevel optimization problem involving costs from both the low-level TO and the high-level search. Simulation and experimental results demonstrate dynamic manipulation for long-horizon object sorting tasks in clutter and on a moving conveyor belt.