RO CLJan 27

ALRM: Agentic LLM for Robotic Manipulation

Vitor Gaboardi dos Santos, Ibrahim Khadraoui, Ibrahim Farhat, Hamza Yous, Samy Teffahi, Hakim Hacid

arXiv:2601.19510v1h-index: 5Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of enabling more reliable and interpretable robotic manipulation from natural language instructions for robotics researchers, though it appears incremental by building on existing agentic and policy methods.

The paper tackles the limited integration of LLMs in robotic control by proposing ALRM, an agentic framework that improves modular execution and multistep reasoning, with experiments showing Claude-4.1-Opus and Falcon-H1-7B as top-performing models in specific modes.

Large Language Models (LLMs) have recently empowered agentic frameworks to exhibit advanced reasoning and planning capabilities. However, their integration in robotic control pipelines remains limited in two aspects: (1) prior \ac{llm}-based approaches often lack modular, agentic execution mechanisms, limiting their ability to plan, reflect on outcomes, and revise actions in a closed-loop manner; and (2) existing benchmarks for manipulation tasks focus on low-level control and do not systematically evaluate multistep reasoning and linguistic variation. In this paper, we propose Agentic LLM for Robot Manipulation (ALRM), an LLM-driven agentic framework for robotic manipulation. ALRM integrates policy generation with agentic execution through a ReAct-style reasoning loop, supporting two complementary modes: Code-asPolicy (CaP) for direct executable control code generation, and Tool-as-Policy (TaP) for iterative planning and tool-based action execution. To enable systematic evaluation, we also introduce a novel simulation benchmark comprising 56 tasks across multiple environments, capturing linguistically diverse instructions. Experiments with ten LLMs demonstrate that ALRM provides a scalable, interpretable, and modular approach for bridging natural language reasoning with reliable robotic execution. Results reveal Claude-4.1-Opus as the top closed-source model and Falcon-H1-7B as the top open-source model under CaP.

View on arXiv PDF

Similar