ROAIJul 14, 2025

Demonstrating the Octopi-1.5 Visual-Tactile-Language Model

arXiv:2507.09985v18 citationsh-index: 5Has CodeRobotics
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing robot perception and interaction through touch, particularly for dexterous manipulation and scenarios with visual occlusion, but it appears incremental as it builds upon recent touch foundation models.

The paper tackles the problem of integrating touch as a vital modality for robots in tasks like dexterous manipulation and material identification by introducing Octopi-1.5, a visual-tactile-language model that processes tactile signals from multiple object parts and uses retrieval-augmented generation to improve performance, demonstrated through live interactions with a handheld tactile-enabled interface.

Touch is recognized as a vital sense for humans and an equally important modality for robots, especially for dexterous manipulation, material identification, and scenarios involving visual occlusion. Building upon very recent work in touch foundation models, this demonstration will feature Octopi-1.5, our latest visual-tactile-language model. Compared to its predecessor, Octopi-1.5 introduces the ability to process tactile signals from multiple object parts and employs a simple retrieval-augmented generation (RAG) module to improve performance on tasks and potentially learn new objects on-the-fly. The system can be experienced live through a new handheld tactile-enabled interface, the TMI, equipped with GelSight and TAC-02 tactile sensors. This convenient and accessible setup allows users to interact with Octopi-1.5 without requiring a robot. During the demonstration, we will showcase Octopi-1.5 solving tactile inference tasks by leveraging tactile inputs and commonsense knowledge. For example, in a Guessing Game, Octopi-1.5 will identify objects being grasped and respond to follow-up queries about how to handle it (e.g., recommending careful handling for soft fruits). We also plan to demonstrate Octopi-1.5's RAG capabilities by teaching it new items. With live interactions, this demonstration aims to highlight both the progress and limitations of VTLMs such as Octopi-1.5 and to foster further interest in this exciting field. Code for Octopi-1.5 and design files for the TMI gripper are available at https://github.com/clear-nus/octopi-1.5.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes