3 Papers

MEMay 26
Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing

Rongyi Sun, Wenguang Sun, Zinan Zhao

This paper addresses structured out-of-distribution (OOD) testing in high-stakes machine learning applications. Traditional conformal methods rely on joint exchangeability, making it difficult to incorporate auxiliary information such as spatiotemporal or grouping structures. To overcome this limitation, we propose the structure-adaptive conformal q-value (SCQ), a significance index that integrates individual test evidence with structural patterns. We also develop pseudo-score-guided transductive automated model selection (P-TAMS), which adapts conformalized model selection to structured OOD testing across a toolbox of candidate models. Together, SCQ and P-TAMS form a unified framework under pairwise exchangeability, providing finite-sample error-rate control, improved power, and enhanced interpretability. Experiments on simulated and real data demonstrate that the proposed approach controls the false discovery rate and performs well across diverse settings.

ROMay 12Code
See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model

Yixu Feng, Zinan Zhao, Yanxiang Ma et al.

Vision-Language-Action (VLA) models have shown remarkable promise in robotics manipulation, yet their high computational cost hinders real-time deployment. Existing token pruning methods suffer from a fundamental trade-off: aggressive compression using pruning inevitably discards critical geometric details like contact points, leading to severe performance degradation. This forces a compromise, limiting the achievable compression rate and thus the potential speedup. We argue that breaking this trade-off requires rethinking compression as a geometry-aware, continuous token resampling in the vision encoder. To this end, we propose the Differentiable Grid Sampler (GridS), a plug-and-play module that performs task-aware, continuous resampling of visual tokens in VLA. By adaptively predicting a minimal set of salient coordinates and extracting features via differentiable interpolation, GridS preserves essential spatial information while achieving drastic compression (with fewer than 10% original visual tokens). Experiments on both LIBERO benchmark and a real robotic platform demonstrate that validating the lowest feasible visual token count reported to date, GridS achieves a 76% reduction in FLOPs with no degradation in the success rate. The code is available at https://github.com/Fediory/Grid-Sampler.

ROMar 31
Kilohertz-Safe: A Scalable Framework for Constrained Dexterous Retargeting

Yinxiao Tian, Ziyi Yang, Zinan Zhao et al.

Dexterous hand teleoperation requires motion re-targeting methods that simultaneously achieve high-frequency real-time performance and enforcement of heterogeneous kinematic and safety constraints. Existing nonlinear optimization-based approaches often incur prohibitive computational cost, limiting their applicability to kilohertz-level control, while learning-based methods typically lack formal safety guarantees. This paper proposes a scalable motion retargeting framework that reformulates the nonlinear retargeting problem into a convex quadratic program in joint differential space. Heterogeneous constraints, including kinematic limits and collision avoidance, are incorporated through systematic linearization, resulting in improved computational efficiency and numerical stability. Control barrier functions are further integrated to provide formal safety guarantees during the retargeting process. The proposed framework is validated through simulations and hardware experiments on the Wuji Hand platform, outperforming state-of-the-art methods such as Dex-Retargeting and GeoRT. The framework achieves high-frequency operation with an average latency of 9.05 ms, while over 95% of retargeted frames satisfy the safety criteria, effectively mitigating self-collision and penetration during complex manipulation tasks.