55.7ROMar 11Code
RL-Augmented MPC for Non-Gaited Legged and Hybrid LocomotionAndrea Patrizi, Carlo Rizzardo, Arturo Laurenzi et al.
We propose a contact-explicit hierarchical architecture coupling Reinforcement Learning (RL) and Model Predictive Control (MPC), where a high-level RL agent provides gait and navigation commands to a low-level locomotion MPC. This offloads the combinatorial burden of contact timing from the MPC by learning acyclic gaits through trial and error in simulation. We show that only a minimal set of rewards and limited tuning are required to obtain effective policies. We validate the architecture in simulation across robotic platforms spanning 50 kg to 120 kg and different MPC implementations, observing the emergence of acyclic gaits and timing adaptations in flat-terrain legged and hybrid locomotion, and further demonstrating extensibility to non-flat terrains. Across all platforms, we achieve zero-shot sim-to-sim transfer without domain randomization, and we further demonstrate zero-shot sim-to-real transfer without domain randomization on Centauro, our 120 kg wheeled-legged humanoid robot. We make our software framework and evaluation results publicly available at https://github.com/AndrePatri/AugMPC.
ROApr 7, 2025
A High-Force Gripper with Embedded Multimodal Sensing for Powerful and Perception Driven GraspingEdoardo Del Bianco, Davide Torielli, Federico Rollo et al.
Modern humanoid robots have shown their promising potential for executing various tasks involving the grasping and manipulation of objects using their end-effectors. Nevertheless, in the most of the cases, the grasping and manipulation actions involve low to moderate payload and interaction forces. This is due to limitations often presented by the end-effectors, which can not match their arm-reachable payload, and hence limit the payload that can be grasped and manipulated. In addition, grippers usually do not embed adequate perception in their hardware, and grasping actions are mainly driven by perception sensors installed in the rest of the robot body, frequently affected by occlusions due to the arm motions during the execution of the grasping and manipulation tasks. To address the above, we developed a modular high grasping force gripper equipped with embedded multi-modal perception functionalities. The proposed gripper can generate a grasping force of 110 N in a compact implementation. The high grasping force capability is combined with embedded multi-modal sensing, which includes an eye-in-hand camera, a Time-of-Flight (ToF) distance sensor, an Inertial Measurement Unit (IMU) and an omnidirectional microphone, permitting the implementation of perception-driven grasping functionalities. We extensively evaluated the grasping force capacity of the gripper by introducing novel payload evaluation metrics that are a function of the robot arm's dynamic motion and gripper thermal states. We also evaluated the embedded multi-modal sensing by performing perception-guided enhanced grasping operations.
ROMar 12, 2021
Agile Actions with a Centaur-Type Humanoid: A Decoupled ApproachMatteo Parigi Polverini, Enrico Mingo Hoffman, Arturo Laurenzi et al.
The kinematic features of a centaur-type humanoid platform, combined with a powerful actuation, enable the experimentation of a variety of agile and dynamic motions. However, the higher number of degrees-of-freedom and the increased weight of the system, compared to the bipedal and quadrupedal counterparts, pose significant research challenges in terms of computational load and real implementation. To this end, this work presents a control architecture to perform agile actions, conceived for torque-controlled platforms, which decouples for computational purposes offline optimal control planning of lower-body primitives, based on a template kinematic model, and online control of the upper-body motion to maintain balance. Three stabilizing strategies are presented, whose performance is compared in two types of simulated jumps, while experimental validation is performed on a half-squat jump using the CENTAURO robot.
ROApr 5, 2020
Curved patch mapping and tracking for irregular terrain modeling: Application to bipedal robot foot placementDimitrios Kanoulas, Nikos G. Tsagarakis, Marsette Vona
Legged robots need to make contact with irregular surfaces, when operating in unstructured natural terrains. Representing and perceiving these areas to reason about potential contact between a robot and its surrounding environment, is still largely an open problem. This paper introduces a new framework to model and map local rough terrain surfaces, for tasks such as bipedal robot foot placement. The system operates in real-time, on data from an RGB-D and an IMU sensor. We introduce a set of parametrized patch models and an algorithm to fit them in the environment. Potential contacts are identified as bounded curved patches of approximately the same size as the robot's foot sole. This includes sparse seed point sampling, point cloud neighborhood search, and patch fitting and validation. We also present a mapping and tracking system, where patches are maintained in a local spatial map around the robot as it moves. A bio-inspired sampling algorithm is introduced for finding salient contacts. We include a dense volumetric fusion layer for spatiotemporally tracking, using multiple depth data to reconstruct a local point cloud. We present experimental results on a mini-biped robot that performs foot placements on rocks, implementing a 3D foothold perception system, that uses the developed patch mapping and tracking framework.
ROSep 19, 2019
Flexible Disaster Response of Tomorrow -- Final Presentation and Evaluation of the CENTAURO SystemTobias Klamt, Diego Rodriguez, Lorenzo Baccelliere et al.
Mobile manipulation robots have high potential to support rescue forces in disaster-response missions. Despite the difficulties imposed by real-world scenarios, robots are promising to perform mission tasks from a safe distance. In the CENTAURO project, we developed a disaster-response system which consists of the highly flexible Centauro robot and suitable control interfaces including an immersive tele-presence suit and support-operator controls on different levels of autonomy. In this article, we give an overview of the final CENTAURO system. In particular, we explain several high-level design decisions and how those were derived from requirements and extensive experience of Kerntechnische Hilfsdienst GmbH, Karlsruhe, Germany (KHG). We focus on components which were recently integrated and report about a systematic evaluation which demonstrated system capabilities and revealed valuable insights.
ROAug 5, 2019
Remote Mobile Manipulation with the Centauro Robot: Full-body Telepresence and Autonomous Operator AssistanceTobias Klamt, Max Schwarz, Christian Lenz et al.
Solving mobile manipulation tasks in inaccessible and dangerous environments is an important application of robots to support humans. Example domains are construction and maintenance of manned and unmanned stations on the moon and other planets. Suitable platforms require flexible and robust hardware, a locomotion approach that allows for navigating a wide variety of terrains, dexterous manipulation capabilities, and respective user interfaces. We present the CENTAURO system which has been designed for these requirements and consists of the Centauro robot and a set of advanced operator interfaces with complementary strength enabling the system to solve a wide range of realistic mobile manipulation tasks. The robot possesses a centaur-like body plan and is driven by torque-controlled compliant actuators. Four articulated legs ending in steerable wheels allow for omnidirectional driving as well as for making steps. An anthropomorphic upper body with two arms ending in five-finger hands enables human-like manipulation. The robot perceives its environment through a suite of multimodal sensors. The resulting platform complexity goes beyond the complexity of most known systems which puts the focus on a suitable operator interface. An operator can control the robot through a telepresence suit, which allows for flexibly solving a large variety of mobile manipulation tasks. Locomotion and manipulation functionalities on different levels of autonomy support the operation. The proposed user interfaces enable solving a wide variety of tasks without previous task-specific training. The integrated system is evaluated in numerous teleoperated experiments that are described along with lessons learned.
CVMar 23, 2019
V2CNet: A Deep Learning Framework to Translate Videos to Commands for Robotic ManipulationAnh Nguyen, Thanh-Toan Do, Ian Reid et al.
We propose V2CNet, a new deep learning framework to automatically translate the demonstration videos to commands that can be directly used in robotic applications. Our V2CNet has two branches and aims at understanding the demonstration video in a fine-grained manner. The first branch has the encoder-decoder architecture to encode the visual features and sequentially generate the output words as a command, while the second branch uses a Temporal Convolutional Network (TCN) to learn the fine-grained actions. By jointly training both branches, the network is able to model the sequential information of the command, while effectively encodes the fine-grained actions. The experimental results on our new large-scale dataset show that V2CNet outperforms recent state-of-the-art methods by a substantial margin, while its output can be applied in real robotic applications. The source code and trained models will be made available.
CVMar 16, 2018
Object Captioning and Retrieval with Natural LanguageAnh Nguyen, Thanh-Toan Do, Ian Reid et al.
We address the problem of jointly learning vision and language to understand the object in a fine-grained manner. The key idea of our approach is the use of object descriptions to provide the detailed understanding of an object. Based on this idea, we propose two new architectures to solve two related problems: object captioning and natural language-based object retrieval. The goal of the object captioning task is to simultaneously detect the object and generate its associated description, while in the object retrieval task, the goal is to localize an object given an input query. We demonstrate that both problems can be solved effectively using hybrid end-to-end CNN-LSTM networks. The experimental results on our new challenging dataset show that our methods outperform recent methods by a fair margin, while providing a detailed understanding of the object and having fast inference time. The source code will be made available.
ROFeb 18, 2018
Center-of-Mass-Based Grasp Pose Adaptation Using 3D Range and Force/Torque SensingDimitrios Kanoulas, Jinoh Lee, Darwin G. Caldwell et al.
Lifting objects, whose mass may produce high wrist torques that exceed the hardware strength limits, could lead to unstable grasps or serious robot damage. This work introduces a new Center-of-Mass (CoM)-based grasp pose adaptation method, for picking up objects using a combination of exteroceptive 3D perception and proprioceptive force/torque sensor feedback. The method works in two iterative stages to provide reliable and wrist torque efficient grasps. Initially, a geometric object CoM is estimated from the input range data. In the first stage, a set of hand-size handle grasps are localized on the object and the closest to its CoM is selected for grasping. In the second stage, the object is lifted using a single arm, while the force and torque readings from the sensor on the wrist are monitored. Based on these readings, a displacement to the new CoM estimation is calculated. The object is released and the process is repeated until the wrist torque effort is minimized. The advantage of our method is the blending of both exteroceptive (3D range) and proprioceptive (force/torque) sensing for finding the grasp location that minimizes the wrist effort, potentially improving the reliability of the grasping and the subsequent manipulation task. We experimentally validate the proposed method by executing a number of tests on a set of objects that include handles, using the humanoid robot WALK-MAN.
ROOct 1, 2017
Translating Videos to Commands for Robotic Manipulation with Deep Recurrent Neural NetworksAnh Nguyen, Dimitrios Kanoulas, Luca Muratore et al.
We present a new method to translate videos to commands for robotic manipulation using Deep Recurrent Neural Networks (RNN). Our framework first extracts deep features from the input video frames with a deep Convolutional Neural Networks (CNN). Two RNN layers with an encoder-decoder architecture are then used to encode the visual features and sequentially generate the output words as the command. We demonstrate that the translation accuracy can be improved by allowing a smooth transaction between two RNN layers and using the state-of-the-art feature extractor. The experimental results on our new challenging dataset show that our approach outperforms recent methods by a fair margin. Furthermore, we combine the proposed translation module with the vision and planning system to let a robot perform various manipulation tasks. Finally, we demonstrate the effectiveness of our framework on a full-size humanoid robot WALK-MAN.
CVAug 22, 2017
Real-Time 6DOF Pose Relocalization for Event Cameras with Stacked Spatial LSTM NetworksAnh Nguyen, Thanh-Toan Do, Darwin G. Caldwell et al.
We present a new method to relocalize the 6DOF pose of an event camera solely based on the event stream. Our method first creates the event image from a list of events that occurs in a very short time interval, then a Stacked Spatial LSTM Network (SP-LSTM) is used to learn the camera pose. Our SP-LSTM is composed of a CNN to learn deep features from the event images and a stack of LSTM to learn spatial dependencies in the image feature space. We show that the spatial dependency plays an important role in the relocalization task and the SP-LSTM can effectively learn this information. The experimental results on a publicly available dataset show that our approach generalizes well and outperforms recent methods by a substantial margin. Overall, our proposed method reduces by approx. 6 times the position error and 3 times the orientation error compared to the current state of the art. The source code and trained models will be released.