ROCLMar 10, 2025

PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM

arXiv:2503.07111v21 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses robot manipulation challenges by enabling depth-free, low-latency control with cross-morphology transfer, though it is incremental as it builds on existing vision-to-joint methods.

The paper tackles robot hand control by directly mapping 2D images to joint angles without explicit pose estimation, achieving competitive joint angle prediction accuracy using synthetic training data and zero-shot generalization to real-world scenarios.

This paper introduces PoseLess, a novel framework for robot hand control that eliminates the need for explicit pose estimation by directly mapping 2D images to joint angles using projected representations. Our approach leverages synthetic training data generated through randomized joint configurations, enabling zero-shot generalization to real-world scenarios and cross-morphology transfer from robotic to human hands. By projecting visual inputs and employing a transformer-based decoder, PoseLess achieves robust, low-latency control while addressing challenges such as depth ambiguity and data scarcity. Experimental results demonstrate competitive performance in joint angle prediction accuracy without relying on any human-labelled dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes