Yi-Ru Yang

h-index2
2papers

2 Papers

17.8ROApr 15
GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks

Yen-Ling Tai, Yi-Ru Yang, Kuan-Ting Yu et al.

Robotic food scooping is a critical manipulation skill for food preparation and service robots. However, existing robot learning algorithms, especially learn-from-demonstration methods, still struggle to handle diverse and dynamic food states, which often results in spillage and reduced reliability. In this work, we introduce GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks. This framework leverages guided diffusion policy to minimize food spillage during scooping and to ensure reliable transfer of food items from the initial to the target location. Specifically, we design a spillage predictor that estimates the probability of spillage given current observation and action rollout. The predictor is trained on a simulated dataset with food spillage scenarios, constructed from four primitive shapes (spheres, cubes, cones, and cylinders) with varied physical properties such as mass, friction, and particle size. At inference time, the predictor serves as a differentiable guidance signal, steering the diffusion sampling process toward safer trajectories while preserving task success. We validate GRITS on a real-world robotic food scooping platform. GRITS is trained on six food categories and evaluated on ten unseen categories with different shapes and quantities. GRITS achieves an 82% task success rate and a 4% spillage rate, reducing spillage by over 40% compared to baselines without guidance, thereby demonstrating its effectiveness. More details are available on our project website: https://hcis-lab.github.io/GRITS/.

ROMar 17, 2025
Mitigating Cross-Modal Distraction and Ensuring Geometric Feasibility via Affordance-Guided and Self-Consistent MLLMs for Task Planning in Instruction-Following Manipulation

Yu-Hong Shen, Chuan-Yu Wu, Yi-Ru Yang et al.

We investigate the use of Multimodal Large Language Models (MLLMs) with in-context learning for closed-loop task planning in instruction-following manipulation. We identify four essential requirements for successful task planning: quantity estimation, reachability analysis, relative positioning, and collision avoidance. However, existing benchmarks fail to support holistic evaluation across all these aspects. To address this gap, we introduce \textbf{QuARC} (Quantity, Analysis, Relative positioning, Collision), a new benchmark based on a food preparation scenario that integrates all four challenges. Using QuARC, we reveal two major limitations of current MLLMs: cross-modal distraction and geometric infeasibility. To tackle these, we adapt Chain-of-Thought with Self-Consistency to mitigate reasoning loss from cross-modal distractions and incorporate an affordance predictor to guide planning based on geometric feasibility. Our comprehensive evaluation analyzes performance across multiple baselines and explains sources of improvement. Our method achieves a 76.7\% success rate on the benchmark, significantly outperforming the ViLa baseline (36.7\%), without requiring additional finetuning. Code and dataset are available at https://hcis-lab.github.io/Affordance-Guided-Self-Consistent-MLLM.