See to Touch: Learning Tactile Dexterity through Visual Incentives
This work addresses the challenge of precise, contact-rich manipulation for robots, offering a significant improvement over existing tactile-only methods, though it is incremental in combining vision and tactile sensing.
The paper tackles the problem of enabling multi-fingered robots to perform dexterous manipulation by enhancing tactile sensing with vision-based rewards, achieving a 73% success rate on six challenging tasks like peg pick-and-place and unstacking bowls. This represents a 108% performance increase over policies using tactile and vision-based rewards and 135% over those without tactile input.
Equipping multi-fingered robots with tactile sensing is crucial for achieving the precise, contact-rich, and dexterous manipulation that humans excel at. However, relying solely on tactile sensing fails to provide adequate cues for reasoning about objects' spatial configurations, limiting the ability to correct errors and adapt to changing situations. In this paper, we present Tactile Adaptation from Visual Incentives (TAVI), a new framework that enhances tactile-based dexterity by optimizing dexterous policies using vision-based rewards. First, we use a contrastive-based objective to learn visual representations. Next, we construct a reward function using these visual representations through optimal-transport based matching on one human demonstration. Finally, we use online reinforcement learning on our robot to optimize tactile-based policies that maximize the visual reward. On six challenging tasks, such as peg pick-and-place, unstacking bowls, and flipping slender objects, TAVI achieves a success rate of 73% using our four-fingered Allegro robot hand. The increase in performance is 108% higher than policies using tactile and vision-based rewards and 135% higher than policies without tactile observational input. Robot videos are best viewed on our project website: https://see-to-touch.github.io/.