ROJun 4

A Conversational Framework for Human-Robot Collaborative Manipulation with Distributed Generative AI models

arXiv:2606.0606110.3Has Code
Predicted impact top 57% in RO · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers in human-robot interaction, this is an incremental integration of existing generative AI models into a modular ROS 2 pipeline for manipulation tasks.

The paper presents a distributed conversational framework for human-robot collaborative manipulation that integrates local LLMs and VLMs with ROS 2, achieving end-to-end task execution with explicit operator confirmation. Experiments on a Franka FR3 platform evaluate reliability and latency under varying scene ambiguity.

This paper presents a distributed conversational framework for human-robot collaborative manipulation that integrates local language and vision-language models (VLMs) with a Robot Operating System 2 (ROS 2)-based execution stack. Language understanding, visual grounding, orchestration, and motion execution run as separate ROS 2 nodes, enabling flexible deployment across distributed hardware while maintaining a responsive control loop. From free-form user commands, the system generates structured action requests for pick, place, and handover. It uses a VLM to return image-space targets, which are converted into metric robot-frame goals using depth and calibration. A web dashboard exposes intermediate intent and grounding overlays (pixel, depth, and robot-frame) and requires explicit operator confirmation before any motion is executed. Experiments on a Franka FR3 platform evaluate end-to-end task reliability and latency under increasing working table scene ambiguity and compare alternative LLM/VLM configurations in the same pipeline. Code and full documentation are available at [github.com/cogrob-tuni/franka-llm](https://github.com/cogrob-tuni/franka-llm).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes