5.4ROApr 20
Hybrid Task and Motion Planning with Reactive Collision Handling for Multi-Robot Disassembly of Complex Products: Application to EV BatteriesAbdelaziz Shaarawy, Cansu Erdogan, Rustam Stolkin et al.
This paper addresses the problem of multi-robot coordination for complex manipulation task sequences. We present a vision-driven task-and-motion planning (TAMP) framework for a real dual-agent platform that integrates task decomposition and allocation with a learning-based RRT planner. A GMM-informed motion planner is coupled with a hybrid safety layer that combines predictive collision checking in a MoveIt/FCL digital twin with reactive vision-based avoidance and replanning. This integration is challenging as the system jointly satisfies task precedence, geometric feasibility, dynamic obstacle avoidance, and dual-arm coordination constraints. The framework operates in closed loop by updating the remaining task sequence from repeated scene scans and completion-state tracking rather than executing a fixed open-loop plan. In EV battery disassembly experiments, compared with Default-RRTConnect under identical perception and task assignments, the proposed system reduces cumulative end-effector path length from 48.8 to 17.9~m ($-63.3\%$), improves makespan from 467.9 to 429.8~s ($-8.1\%$), and reduces swept volumes (R1: $0.583\rightarrow0.139\,\mathrm{m}^3$, R2: $0.696\rightarrow0.252\,\mathrm{m}^3$) and overlap ($0.064\rightarrow0.034\,\mathrm{m}^3$). These results show that combining predictive planning and reactive collision avoidance in a real dual-arm disassembly scenario improves motion compactness, safety, and scalability to broader multi-robot sequential manipulation tasks.
ROJul 14, 2025
Probabilistic Human Intent Prediction for Mobile Manipulation: An Evaluation with Human-Inspired ConstraintsCesar Alan Contreras, Manolis Chiou, Alireza Rastegarpanah et al.
Accurate inference of human intent enables human-robot collaboration without constraining human control or causing conflicts between humans and robots. We present GUIDER (Global User Intent Dual-phase Estimation for Robots), a probabilistic framework that enables a robot to estimate the intent of human operators. GUIDER maintains two coupled belief layers, one tracking navigation goals and the other manipulation goals. In the Navigation phase, a Synergy Map blends controller velocity with an occupancy grid to rank interaction areas. Upon arrival at a goal, an autonomous multi-view scan builds a local 3D cloud. The Manipulation phase combines U2Net saliency, FastSAM instance saliency, and three geometric grasp-feasibility tests, with an end-effector kinematics-aware update rule that evolves object probabilities in real-time. GUIDER can recognize areas and objects of intent without predefined goals. We evaluated GUIDER on 25 trials (five participants x five task variants) in Isaac Sim, and compared it with two baselines, one for navigation and one for manipulation. Across the 25 trials, GUIDER achieved a median stability of 93-100% during navigation, compared with 60-100% for the BOIR baseline, with an improvement of 39.5% in a redirection scenario (T5). During manipulation, stability reached 94-100% (versus 69-100% for Trajectron), with a 31.4% difference in a redirection task (T3). In geometry-constrained trials (manipulation), GUIDER recognized the object intent three times earlier than Trajectron (median remaining time to confident prediction 23.6 s vs 7.8 s). These results validate our dual-phase framework and show improvements in intent inference in both phases of mobile manipulation tasks.
ROOct 20, 2025
Intent-Driven LLM Ensemble Planning for Flexible Multi-Robot Disassembly: Demonstration on EV BatteriesCansu Erdogan, Cesar Alan Contreras, Alireza Rastegarpanah et al.
This paper addresses the problem of planning complex manipulation tasks, in which multiple robots with different end-effectors and capabilities, informed by computer vision, must plan and execute concatenated sequences of actions on a variety of objects that can appear in arbitrary positions and configurations in unstructured scenes. We propose an intent-driven planning pipeline which can robustly construct such action sequences with varying degrees of supervisory input from a human using simple language instructions. The pipeline integrates: (i) perception-to-text scene encoding, (ii) an ensemble of large language models (LLMs) that generate candidate removal sequences based on the operator's intent, (iii) an LLM-based verifier that enforces formatting and precedence constraints, and (iv) a deterministic consistency filter that rejects hallucinated objects. The pipeline is evaluated on an example task in which two robot arms work collaboratively to dismantle an Electric Vehicle battery for recycling applications. A variety of components must be grasped and removed in specific sequences, determined by human instructions and/or by task-order feasibility decisions made by the autonomous system. On 200 real scenes with 600 operator prompts across five component classes, we used metrics of full-sequence correctness and next-task correctness to evaluate and compare five LLM-based planners (including ablation analyses of pipeline components). We also evaluated the LLM-based human interface in terms of time to execution and NASA TLX with human participant experiments. Results indicate that our ensemble-with-verification approach reliably maps operator intent to safe, executable multi-robot plans while maintaining low user effort.
ROAug 14, 2025
Utilizing Vision-Language Models as Action Models for Intent Recognition and AssistanceCesar Alan Contreras, Manolis Chiou, Alireza Rastegarpanah et al.
Human-robot collaboration requires robots to quickly infer user intent, provide transparent reasoning, and assist users in achieving their goals. Our recent work introduced GUIDER, our framework for inferring navigation and manipulation intents. We propose augmenting GUIDER with a vision-language model (VLM) and a text-only language model (LLM) to form a semantic prior that filters objects and locations based on the mission prompt. A vision pipeline (YOLO for object detection and the Segment Anything Model for instance segmentation) feeds candidate object crops into the VLM, which scores their relevance given an operator prompt; in addition, the list of detected object labels is ranked by a text-only LLM. These scores weight the existing navigation and manipulation layers of GUIDER, selecting context-relevant targets while suppressing unrelated objects. Once the combined belief exceeds a threshold, autonomy changes occur, enabling the robot to navigate to the desired area and retrieve the desired object, while adapting to any changes in the operator's intent. Future work will evaluate the system on Isaac Sim using a Franka Emika arm on a Ridgeback base, with a focus on real-time assistance.