AIROMar 19, 2025

GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback

arXiv:2503.15035v12 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses a fundamental bottleneck in robotic manipulation for real-world applications, though it is incremental as it builds upon existing policy models.

The paper tackles the problem of inconsistent and unstable robotic grasping by introducing GraspCorrect, a plug-and-play module that uses vision-language model-guided feedback to improve grasp stability, resulting in significant enhancements in task success rates on RLBench and CALVIN datasets.

Despite significant advancements in robotic manipulation, achieving consistent and stable grasping remains a fundamental challenge, often limiting the successful execution of complex tasks. Our analysis reveals that even state-of-the-art policy models frequently exhibit unstable grasping behaviors, leading to failure cases that create bottlenecks in real-world robotic applications. To address these challenges, we introduce GraspCorrect, a plug-and-play module designed to enhance grasp performance through vision-language model-guided feedback. GraspCorrect employs an iterative visual question-answering framework with two key components: grasp-guided prompting, which incorporates task-specific constraints, and object-aware sampling, which ensures the selection of physically feasible grasp candidates. By iteratively generating intermediate visual goals and translating them into joint-level actions, GraspCorrect significantly improves grasp stability and consistently enhances task success rates across existing policy models in the RLBench and CALVIN datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes