ROAug 4, 2022Code
LATTE: LAnguage Trajectory TransformErArthur Bucker, Luis Figueredo, Sami Haddadin et al.
Natural language is one of the most intuitive ways to express human intent. However, translating instructions and commands towards robotic motion generation and deployment in the real world is far from being an easy task. The challenge of combining a robot's inherent low-level geometric and kinodynamic constraints with a human's high-level semantic instructions traditionally is solved using task-specific solutions with little generalizability between hardware platforms, often with the use of static sets of target actions and commands. This work instead proposes a flexible language-based framework that allows a user to modify generic robotic trajectories. Our method leverages pre-trained language models (BERT and CLIP) to encode the user's intent and target objects directly from a free-form text input and scene images, fuses geometrical features generated by a transformer encoder network, and finally outputs trajectories using a transformer decoder, without the need of priors related to the task or robot information. We significantly extend our own previous work presented in Bucker et al. by expanding the trajectory parametrization space to 3D and velocity as opposed to just XY movements. In addition, we now train the model to use actual images of the objects in the scene for context (as opposed to textual descriptions), and we evaluate the system in a diverse set of scenarios beyond manipulation, such as aerial and legged robots. Our simulated and real-life experiments demonstrate that our transformer model can successfully follow human intent, modifying the shape and speed of trajectories within multiple environments. Codebase available at: https://github.com/arthurfenderbucker/LaTTe-Language-Trajectory-TransformEr.git
ROMar 25, 2022
Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using TransformersArthur Bucker, Luis Figueredo, Sami Haddadin et al.
Natural language is the most intuitive medium for us to interact with other people when expressing commands and instructions. However, using language is seldom an easy task when humans need to express their intent towards robots, since most of the current language interfaces require rigid templates with a static set of action targets and commands. In this work, we provide a flexible language-based interface for human-robot collaboration, which allows a user to reshape existing trajectories for an autonomous agent. We take advantage of recent advancements in the field of large language models (BERT and CLIP) to encode the user command, and then combine these features with trajectory information using multi-modal attention transformers. We train the model using imitation learning over a dataset containing robot trajectories modified by language commands, and treat the trajectory generation process as a sequence prediction problem, analogously to how language generation architectures operate. We evaluate the system in multiple simulated trajectory scenarios, and show a significant performance increase of our model over baseline approaches. In addition, our real-world experiments with a robot arm show that users significantly prefer our natural language interface over traditional methods such as kinesthetic teaching or cost-function programming. Our study shows how the field of robotics can take advantage of large pre-trained language models towards creating more intuitive interfaces between robots and machines. Project webpage: https://arthurfenderbucker.github.io/NL_trajectory_reshaper/
ROOct 18, 2023
LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop ManipulationShengqiang Zhang, Philipp Wicke, Lütfi Kerem Şenel et al.
The convergence of embodied agents and large language models (LLMs) has brought significant advancements to embodied instruction following. Particularly, the strong reasoning capabilities of LLMs make it possible for robots to perform long-horizon tasks without expensive annotated demonstrations. However, public benchmarks for testing the long-horizon reasoning capabilities of language-conditioned robots in various scenarios are still missing. To fill this gap, this work focuses on the tabletop manipulation task and releases a simulation benchmark, \textit{LoHoRavens}, which covers various long-horizon reasoning aspects spanning color, size, space, arithmetics and reference. Furthermore, there is a key modality bridging problem for long-horizon manipulation tasks with LLMs: how to incorporate the observation feedback during robot execution for the LLM's closed-loop planning, which is however less studied by prior work. We investigate two methods of bridging the modality gap: caption generation and learnable interface for incorporating explicit and implicit observation feedback to the LLM, respectively. These methods serve as the two baselines for our proposed benchmark. Experiments show that both methods struggle to solve some tasks, indicating long-horizon manipulation tasks are still challenging for current popular models. We expect the proposed public benchmark and baselines can help the community develop better models for long-horizon tabletop manipulation tasks.
24.6ROMay 25
PhyPush: One Push is All You Need for Sensorless Physical Property Estimation with Physics-Guided TransformersKoyo Fujii, Luis Figueredo, Praminda Caleb-Solly et al.
Accurately estimating object mass and friction is fundamental to achieving reliable and adaptive robotic manipulation. Although interactive perception provides a powerful mechanism for inferring such properties, most existing approaches depend on specialized hardware such as force/torque sensors, tactile arrays, or multi-camera motion-capture systems, limiting scalability and deployment. This paper presents PhyPush, a physics-guided Transformer framework that estimates an object's mass and friction coefficient using only kinematically derived end-effector velocity from a single push. This typically requires data available on standard robotic arms. The model incorporates constraints from Newton's second law and the Coulomb friction model through a physics-guided loss, improving physical consistency and generalization to unseen objects and surfaces. Across diverse simulation and real-world setups, PhyPush consistently achieves more accurate mass and friction estimation in challenging out-of-domain conditions. In simulation, it reduces error by over 10% compared with a baseline that has privileged access to full force information, while in real-world experiments, it outperforms a data-driven loss approach. Overall, the results demonstrate that physics-guided learning can enable low-cost, sensor-efficient estimation of physical properties, relying solely on a single push and readily available kinematic data.
ROSep 23, 2024
DRAPER: Towards a Robust Robot Deployment and Reliable Evaluation for Quasi-Static Pick-and-Place Cloth-Shaping Neural ControllersHalid Abdulrahim Kadi, Jose Alex Chandy, Luis Figueredo et al.
Comparing robotic cloth-manipulation systems in a real-world setup is challenging. The fidelity gap between simulation-trained cloth neural controllers and real-world operation hinders the reliable deployment of these methods in physical trials. Inconsistent experimental setups and hardware limitations among different approaches obstruct objective evaluations. This study demonstrates a reliable real-world comparison of different simulation-trained neural controllers on both flattening and folding tasks with different types of fabrics varying in material, size, and colour. We introduce the DRAPER framework to enable this comprehensive study, which reliably reflects the true capabilities of these neural controllers. It specifically addresses real-world grasping errors, such as misgrasping and multilayer grasping, through real-world adaptations of the simulation environment to provide data trajectories that closely reflect real-world grasping scenarios. It also employs a special set of vision processing techniques to close the simulation-to-reality gap in the perception. Furthermore, it achieves robust grasping by adopting a tweezer-extended gripper and a grasping procedure. We demonstrate DRAPER's generalisability across different deep-learning methods and robotic platforms, offering valuable insights to the cloth manipulation research community.
AISep 13, 2024
Using The Concept Hierarchy for Household Action RecognitionAndrei Costinescu, Luis Figueredo, Darius Burschka
We propose a method to systematically represent both the static and the dynamic components of environments, i.e. objects and agents, as well as the changes that are happening in the environment, i.e. the actions and skills performed by agents. Our approach, the Concept Hierarchy, provides the necessary information for autonomous systems to represent environment states, perform action modeling and recognition, and plan the execution of tasks. Additionally, the hierarchical structure supports generalization and knowledge transfer to environments. We rigorously define tasks, actions, skills, and affordances that enable human-understandable action and skill recognition.
ROMar 19, 2025
Geometrically-Aware One-Shot Skill Transfer of Category-Level ObjectsCristiana de Farias, Luis Figueredo, Riddhiman Laha et al.
Robotic manipulation of unfamiliar objects in new environments is challenging and requires extensive training or laborious pre-programming. We propose a new skill transfer framework, which enables a robot to transfer complex object manipulation skills and constraints from a single human demonstration. Our approach addresses the challenge of skill acquisition and task execution by deriving geometric representations from demonstrations focusing on object-centric interactions. By leveraging the Functional Maps (FM) framework, we efficiently map interaction functions between objects and their environments, allowing the robot to replicate task operations across objects of similar topologies or categories, even when they have significantly different shapes. Additionally, our method incorporates a Task-Space Imitation Algorithm (TSIA) which generates smooth, geometrically-aware robot paths to ensure the transferred skills adhere to the demonstrated task constraints. We validate the effectiveness and adaptability of our approach through extensive experiments, demonstrating successful skill transfer and task execution in diverse real-world environments without requiring additional training.