STEER: Flexible Robotic Manipulation via Dense Language Grounding
This addresses the need for flexible robotic manipulation in complex real-world environments, representing a novel method rather than an incremental improvement.
The authors tackled the problem of enabling robots to adapt to unseen situations by developing STEER, a framework that translates situational awareness into low-level control through language-grounded policies, resulting in the ability to synthesize novel behaviors without additional training.
The complexity of the real world demands robotic systems that can intelligently adapt to unseen situations. We present STEER, a robot learning framework that bridges high-level, commonsense reasoning with precise, flexible low-level control. Our approach translates complex situational awareness into actionable low-level behavior through training language-grounded policies with dense annotation. By structuring policy training around fundamental, modular manipulation skills expressed in natural language, STEER exposes an expressive interface for humans or Vision-Language Models (VLMs) to intelligently orchestrate the robot's behavior by reasoning about the task and context. Our experiments demonstrate the skills learned via STEER can be combined to synthesize novel behaviors to adapt to new situations or perform completely new tasks without additional data collection or training.