ROAIOct 17, 2024

Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand

arXiv:2410.14022v114 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses the problem of precise and robust control for robotic hands in manipulation tasks, representing an incremental advance by integrating existing models with a switching mechanism.

The paper tackled autonomous dexterous manipulation by proposing a hybrid control method combining a Vision-Language-Action model for high-level planning and a diffusion model for low-level interactions, achieving over 80% success rate in pick-and-place tasks compared to under 40% with only the VLA model.

To advance autonomous dexterous manipulation, we propose a hybrid control method that combines the relative advantages of a fine-tuned Vision-Language-Action (VLA) model and diffusion models. The VLA model provides language commanded high-level planning, which is highly generalizable, while the diffusion model handles low-level interactions which offers the precision and robustness required for specific objects and environments. By incorporating a switching signal into the training-data, we enable event based transitions between these two models for a pick-and-place task where the target object and placement location is commanded through language. This approach is deployed on our anthropomorphic ADAPT Hand 2, a 13DoF robotic hand, which incorporates compliance through series elastic actuation allowing for resilience for any interactions: showing the first use of a multi-fingered hand controlled with a VLA model. We demonstrate this model switching approach results in a over 80\% success rate compared to under 40\% when only using a VLA model, enabled by accurate near-object arm motion by the VLA model and a multi-modal grasping motion with error recovery abilities from the diffusion model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes