ROAILGMay 4, 2025

Prompt-responsive Object Retrieval with Memory-augmented Student-Teacher Learning

arXiv:2505.02232v1h-index: 5ICRA
Originality Incremental advance
AI Analysis

This work addresses a gap in robotics for targeted manipulation amidst clutter, enabling robots to perform dexterous tasks based on user prompts, though it appears incremental by integrating existing models like SAM 2 with a novel learning framework.

The paper tackles the problem of linking high-level user prompts with fine-grained dexterous control in robotics, specifically for picking objects from cluttered scenes, by combining promptable foundation models with reinforcement learning, achieving successful prompt-responsive policies as demonstrated in experiments.

Building models responsive to input prompts represents a transformative shift in machine learning. This paradigm holds significant potential for robotics problems, such as targeted manipulation amidst clutter. In this work, we present a novel approach to combine promptable foundation models with reinforcement learning (RL), enabling robots to perform dexterous manipulation tasks in a prompt-responsive manner. Existing methods struggle to link high-level commands with fine-grained dexterous control. We address this gap with a memory-augmented student-teacher learning framework. We use the Segment-Anything 2 (SAM 2) model as a perception backbone to infer an object of interest from user prompts. While detections are imperfect, their temporal sequence provides rich information for implicit state estimation by memory-augmented models. Our approach successfully learns prompt-responsive policies, demonstrated in picking objects from cluttered scenes. Videos and code are available at https://memory-student-teacher.github.io

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes