Dongyan Xu

RO
3papers
3citations
Novelty45%
AI Score43

3 Papers

67.4ROMay 19Code
The Yes-Man Syndrome: Benchmarking Abstention in Embodied Robotic Agents

Doguhan Yeke, Elif Su Temirel, Ananth Shreekumar et al.

Vision-language models (VLMs) are used as high-level planners for embodied agents, translating natural language instructions and visual observations into action plans. While prior work has studied abstention in LLMs, existing benchmarks are largely text-only and do not capture the perceptual grounding and physical constraints inherent to embodied robotics environments. In such settings, abstention requires recognizing when instructions are ambiguous, physically infeasible, based on false premises, or otherwise unresolvable given the available sensory modalities and context. To address this gap, we introduce a taxonomy to categorize abstention in the context of embodied robotics and present RoboAbstention, a scalable and auditable framework for generating abstention instructions grounded in images gathered from five robotics datasets. RoboAbstention instantiates the taxonomy through a three-phase pipeline: (1) structured visual grounding, (2) deterministic constraint derivation, and (3) controlled instruction generation via category-specific templates. This enables the construction of a diverse dataset with verifiable abstention conditions. We evaluate several frontier VLMs and find that all models exhibit significant weaknesses in abstention, including those with advanced reasoning capabilities. The best-performing model, Gemini 2.5 Flash, abstains on only 39.0% of our 6,069 benchmark instructions, while the embodied planner Gemini Robotics ER 1.6 Preview abstains on just 16.5%. We further explore methods for improving abstention in VLM planners, such as defensive prompting and in-context learning, and find that these interventions substantially improve performance, reaching 93.6% abstention rate for Gemini Robotics ER 1.6 Preview and 88.6% for GPT 5.4 Mini, yet no approach fully solves the problem. We open-source RoboAbstention at https://purseclab.github.io/RoboAbstention/.

75.3LGMay 3
Stable GFlowNets with Probabilistic Guarantees

Zengxiang Lei, Ananth Shreekumar, Jonathan Rosenthal et al.

Generative Flow Networks (GFlowNets) learn to sample states proportional to an unnormalized reward. Despite their theoretical promise, practical training is often unstable, exhibiting severe loss spikes and mode collapse. To tackle this, we first assess the sensitivity of GFlowNet objectives, demonstrating that a small Total Variation (TV) distance between the learned and target distributions does not preclude unbounded training loss. Motivated by this mismatch, we establish converse guarantees by deriving loss-to-TV bounds that certify global fidelity from bounded trajectory balance losses. Lastly, we propose Stable GFlowNets, an algorithm that leverages our theoretical results to stabilize training, and empirically demonstrate improved training behavior and superior distributional fidelity.

RONov 30, 2018
Flight Recovery of MAVs with Compromised IMU

Zhan Tu, Fan Fei, Matthew Eagon et al.

Micro Aerial Vehicles (MAVs) rely on onboard attitude and position sensors for autonomous flight. Due to their size, weight, and power (SWaP) constraints, most modern MAVs use miniaturized inertial measurement units (IMUs) to provide attitude feedback, which is critical for flight stabilization and control. However, recent adversarial attack studies have demonstrated that many commonly used IMUs are vulnerable to attacks exploiting their physical characteristics. Conventional redundancy-based approaches are not effective against such attacks because redundant IMUs have the same or similar physical vulnerabilities. In this paper, we present a novel fault-tolerant solution for IMU compromised scenarios, using separate position and heading information to restore the failed attitude states. Rather than adding more IMU alternatives for recovery, the proposed method is intended to minimize any modifications to the existing system and control program. Thus, it is particularly useful for vehicles that have tight SWaP constraints while requiring simultaneous high performance and safety demands. To execute the recovery logic properly, a robust estimator was designed for fine-grained detection and isolation of the faulty sensors. The effectiveness of the proposed approach was validated on a quadcopter MAV through both simulation and experimental flight tests.