Elie Aljalbout

RO
h-index44
22papers
569citations
Novelty45%
AI Score48

22 Papers

RONov 28, 2022
CLAS: Coordinating Multi-Robot Manipulation with Central Latent Action Spaces

Elie Aljalbout, Maximilian Karl, Patrick van der Smagt

Multi-robot manipulation tasks involve various control entities that can be separated into dynamically independent parts. A typical example of such real-world tasks is dual-arm manipulation. Learning to naively solve such tasks with reinforcement learning is often unfeasible due to the sample complexity and exploration requirements growing with the dimensionality of the action and state spaces. Instead, we would like to handle such environments as multi-agent systems and have several agents control parts of the whole. However, decentralizing the generation of actions requires coordination across agents through a channel limited to information central to the task. This paper proposes an approach to coordinating multi-robot manipulation through learned latent action spaces that are shared across different agents. We validate our method in simulated multi-robot manipulation tasks and demonstrate improvement over previous baselines in terms of sample efficiency and learning performance.

ROJul 18, 2024
LIMT: Language-Informed Multi-Task Visual World Models

Elie Aljalbout, Nikolaos Sotirakis, Patrick van der Smagt et al.

Most recent successes in robot reinforcement learning involve learning a specialized single-task agent. However, robots capable of performing multiple tasks can be much more valuable in real-world applications. Multi-task reinforcement learning can be very challenging due to the increased sample complexity and the potentially conflicting task objectives. Previous work on this topic is dominated by model-free approaches. The latter can be very sample inefficient even when learning specialized single-task agents. In this work, we focus on model-based multi-task reinforcement learning. We propose a method for learning multi-task visual world models, leveraging pre-trained language models to extract semantically meaningful task representations. These representations are used by the world model and policy to reason about task similarity in dynamics and behavior. Our results highlight the benefits of using language-driven task representations for world models and a clear advantage of model-based multi-task learning over the more common model-free paradigm.

ROJul 3, 2024
The Shortcomings of Force-from-Motion in Robot Learning

Elie Aljalbout, Felix Frank, Patrick van der Smagt et al.

Robotic manipulation requires accurate motion and physical interaction control. However, current robot learning approaches focus on motion-centric action spaces that do not explicitly give the policy control over the interaction. In this paper, we discuss the repercussions of this choice and argue for more interaction-explicit action spaces in robot learning.

ROApr 10
Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight

Angel Romero, Ashwin Shenai, Ismail Geles et al.

Autonomous drone racing has risen as a challenging robotic benchmark for testing the limits of learning, perception, planning, and control. Expert human pilots are able to fly a drone through a race track by mapping pixels from a single camera directly to control commands. Recent works in autonomous drone racing attempting direct pixel-to-commands control policies have relied on either intermediate representations that simplify the observation space or performed extensive bootstrapping using Imitation Learning (IL). This paper leverages DreamerV3 to train visuomotor policies capable of agile flight through a racetrack using only pixels as observations. In contrast to model-free methods like PPO or SAC, which are sample-inefficient and struggle in this setting, our approach acquires drone racing skills from pixels. Notably, a perception-aware behaviour of actively steering the camera toward texture-rich gate regions emerges without the need of handcrafted reward terms for the viewing direction. Our experiments show in both, simulation and real-world flight using a hardware-in-the-loop setup with rendered image observations, how the proposed approach can be deployed on real quadrotors at speeds of up to 9 m/s. These results advance the state of pixel-based autonomous flight and demonstrate that MBRL offers a promising path for real-world robotics research.

CVDec 15, 2024Code
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

Mariam Hassan, Sebastian Stapf, Ahmad Rahimi et al.

We present GEM, a Generalizable Ego-vision Multimodal world model that predicts future frames using a reference frame, sparse features, human poses, and ego-trajectories. Hence, our model has precise control over object dynamics, ego-agent motion and human poses. GEM generates paired RGB and depth outputs for richer spatial understanding. We introduce autoregressive noise schedules to enable stable long-horizon generations. Our dataset is comprised of 4000+ hours of multimodal data across domains like autonomous driving, egocentric human activities, and drone flights. Pseudo-labels are used to get depth maps, ego-trajectories, and human poses. We use a comprehensive evaluation framework, including a new Control of Object Manipulation (COM) metric, to assess controllability. Experiments show GEM excels at generating diverse, controllable scenarios and temporal consistency over long generations. Code, models, and datasets are fully open-sourced.

RODec 6, 2023
On the Role of the Action Space in Robot Manipulation Learning and Sim-to-Real Transfer

Elie Aljalbout, Felix Frank, Maximilian Karl et al.

We study the choice of action space in robot manipulation learning and sim-to-real transfer. We define metrics that assess the performance, and examine the emerging properties in the different action spaces. We train over 250 reinforcement learning~(RL) agents in simulated reaching and pushing tasks, using 13 different control spaces. The choice of spaces spans combinations of common action space design characteristics. We evaluate the training performance in simulation and the transfer to a real-world environment. We identify good and bad characteristics of robotic action spaces and make recommendations for future designs. Our findings have important implications for the design of RL algorithms for robot manipulation tasks, and highlight the need for careful consideration of action spaces when training and transferring RL agents for real-world robotics.

RODec 17, 2024
Multi-Task Reinforcement Learning for Quadrotors

Jiaxu Xing, Ismail Geles, Yunlong Song et al.

Reinforcement learning (RL) has shown great effectiveness in quadrotor control, enabling specialized policies to develop even human-champion-level performance in single-task scenarios. However, these specialized policies often struggle with novel tasks, requiring a complete retraining of the policy from scratch. To address this limitation, this paper presents a novel multi-task reinforcement learning (MTRL) framework tailored for quadrotor control, leveraging the shared physical dynamics of the platform to enhance sample efficiency and task performance. By employing a multi-critic architecture and shared task encoders, our framework facilitates knowledge transfer across tasks, enabling a single policy to execute diverse maneuvers, including high-speed stabilization, velocity tracking, and autonomous racing. Our experimental results, validated both in simulation and real-world scenarios, demonstrate that our framework outperforms baseline approaches in terms of sample efficiency and overall task performance.

ROFeb 27, 2025
Accelerating Model-Based Reinforcement Learning with State-Space World Models

Maria Krinner, Elie Aljalbout, Angel Romero et al.

Reinforcement learning (RL) is a powerful approach for robot learning. However, model-free RL (MFRL) requires a large number of environment interactions to learn successful control policies. This is due to the noisy RL training updates and the complexity of robotic systems, which typically involve highly non-linear dynamics and noisy sensor signals. In contrast, model-based RL (MBRL) not only trains a policy but simultaneously learns a world model that captures the environment's dynamics and rewards. The world model can either be used for planning, for data collection, or to provide first-order policy gradients for training. Leveraging a world model significantly improves sample efficiency compared to model-free RL. However, training a world model alongside the policy increases the computational complexity, leading to longer training times that are often intractable for complex real-world scenarios. In this work, we propose a new method for accelerating model-based RL using state-space world models. Our approach leverages state-space models (SSMs) to parallelize the training of the dynamics model, which is typically the main computational bottleneck. Additionally, we propose an architecture that provides privileged information to the world model during training, which is particularly relevant for partially observable environments. We evaluate our method in several real-world agile quadrotor flight tasks, involving complex dynamics, for both fully and partially observable environments. We demonstrate a significant speedup, reducing the world model training time by up to 10 times, and the overall MBRL training time by up to 4 times. This benefit comes without compromising performance, as our method achieves similar sample efficiency and task rewards to state-of-the-art MBRL methods.

RODec 12, 2024
Student-Informed Teacher Training

Nico Messikommer, Jiaxu Xing, Elie Aljalbout et al.

Imitation learning with a privileged teacher has proven effective for learning complex control behaviors from high-dimensional inputs, such as images. In this framework, a teacher is trained with privileged task information, while a student tries to predict the actions of the teacher with more limited observations, e.g., in a robot navigation task, the teacher might have access to distances to nearby obstacles, while the student only receives visual observations of the scene. However, privileged imitation learning faces a key challenge: the student might be unable to imitate the teacher's behavior due to partial observability. This problem arises because the teacher is trained without considering if the student is capable of imitating the learned behavior. To address this teacher-student asymmetry, we propose a framework for joint training of the teacher and student policies, encouraging the teacher to learn behaviors that can be imitated by the student despite the latters' limited access to information and its partial observability. Based on the performance bound in imitation learning, we add (i) the approximated action difference between teacher and student as a penalty term to the reward function of the teacher, and (ii) a supervised teacher-student alignment step. We motivate our method with a maze navigation task and demonstrate its effectiveness on complex vision-based quadrotor flight and manipulation tasks.

ROMar 8
Approximate Imitation Learning for Event-based Quadrotor Flight in Cluttered Environments

Nico Messikommer, Jiaxu Xing, Leonard Bauersfeld et al.

Event cameras offer high temporal resolution and low latency, making them ideal sensors for high-speed robotic applications where conventional cameras suffer from image degradations such as motion blur. In addition, their low power consumption can enhance endurance, which is critical for resource-constrained platforms. Motivated by these properties, we present a novel approach that enables a quadrotor to fly through cluttered environments at high speed by perceiving the environment with a single event camera. Our proposed method employs an end-to-end neural network trained to map event data directly to control commands, eliminating the reliance on standard cameras. To enable efficient training in simulation, where rendering synthetic event data is computationally expensive, we propose Approximate Imitation Learning, a novel imitation learning framework. Our approach leverages a large-scale offline dataset to learn a task-specific representation space. Subsequently, the policy is trained through online interactions that rely solely on lightweight, simulated state information, eliminating the need to render events during training. This enables the efficient training of event-based control policies for fast quadrotor flight, highlighting the potential of our framework for other modalities where data simulation is costly or impractical. Our approach outperforms standard imitation learning baselines in simulation and demonstrates robust performance in real-world flight tests, achieving speeds up to 9.8 ms-1 in cluttered environments.

ROOct 23, 2025
The Reality Gap in Robotics: Challenges, Solutions, and Best Practices

Elie Aljalbout, Jiaxu Xing, Angel Romero et al. · mit, nvidia

Machine learning has facilitated significant advancements across various robotics domains, including navigation, locomotion, and manipulation. Many such achievements have been driven by the extensive use of simulation as a critical tool for training and testing robotic systems prior to their deployment in real-world environments. However, simulations consist of abstractions and approximations that inevitably introduce discrepancies between simulated and real environments, known as the reality gap. These discrepancies significantly hinder the successful transfer of systems from simulation to the real world. Closing this gap remains one of the most pressing challenges in robotics. Recent advances in sim-to-real transfer have demonstrated promising results across various platforms, including locomotion, navigation, and manipulation. By leveraging techniques such as domain randomization, real-to-sim transfer, state and action abstractions, and sim-real co-training, many works have overcome the reality gap. However, challenges persist, and a deeper understanding of the reality gap's root causes and solutions is necessary. In this survey, we present a comprehensive overview of the sim-to-real landscape, highlighting the causes, solutions, and evaluation metrics for the reality gap and sim-to-real transfer.

ROMar 22, 2024
Guided Decoding for Robot On-line Motion Generation and Adaption

Nutan Chen, Botond Cseke, Elie Aljalbout et al.

We present a novel motion generation approach for robot arms, with high degrees of freedom, in complex settings that can adapt online to obstacles or new via points. Learning from Demonstration facilitates rapid adaptation to new tasks and optimizes the utilization of accumulated expertise by allowing robots to learn and generalize from demonstrated trajectories. We train a transformer architecture, based on conditional variational autoencoder, on a large dataset of simulated trajectories used as demonstrations. Our architecture learns essential motion generation skills from these demonstrations and is able to adapt them to meet auxiliary tasks. Additionally, our approach implements auto-regressive motion generation to enable real-time adaptations, as, for example, introducing or changing via-points, and velocity and acceleration constraints. Using beam search, we present a method for further adaption of our motion generator to avoid obstacles. We show that our model successfully generates motion from different initial and target points and that is capable of generating trajectories that navigate complex tasks across different robotic platforms.

ROOct 19, 2021
Learning Robotic Manipulation Skills Using an Adaptive Force-Impedance Action Space

Maximilian Ulmer, Elie Aljalbout, Sascha Schwarz et al.

Intelligent agents must be able to think fast and slow to perform elaborate manipulation tasks. Reinforcement Learning (RL) has led to many promising results on a range of challenging decision-making tasks. However, in real-world robotics, these methods still struggle, as they require large amounts of expensive interactions and have slow feedback loops. On the other hand, fast human-like adaptive control methods can optimize complex robotic interactions, yet fail to integrate multimodal feedback needed for unstructured tasks. In this work, we propose to factor the learning problem in a hierarchical learning and adaption architecture to get the best of both worlds. The framework consists of two components, a slow reinforcement learning policy optimizing the task strategy given multimodal observations, and a fast, real-time adaptive control policy continuously optimizing the motion, stability, and effort of the manipulator. We combine these components through a bio-inspired action space that we call AFORCE. We demonstrate the new action space on a contact-rich manipulation task on real hardware and evaluate its performance on three simulated manipulation tasks. Our experiments show that AFORCE drastically improves sample efficiency while reducing energy consumption and improving safety.

ROOct 15, 2021
Dual-Arm Adversarial Robot Learning

Elie Aljalbout

Robot learning is a very promising topic for the future of automation and machine intelligence. Future robots should be able to autonomously acquire skills, learn to represent their environment, and interact with it. While these topics have been explored in simulation, real-world robot learning research seems to be still limited. This is due to the additional challenges encountered in the real-world, such as noisy sensors and actuators, safe exploration, non-stationary dynamics, autonomous environment resetting as well as the cost of running experiments for long periods of time. Unless we develop scalable solutions to these problems, learning complex tasks involving hand-eye coordination and rich contacts will remain an untouched vision that is only feasible in controlled lab environments. We propose dual-arm settings as platforms for robot learning. Such settings enable safe data collection for acquiring manipulation skills as well as training perception modules in a robot-supervised manner. They also ease the processes of resetting the environment. Furthermore, adversarial learning could potentially boost the generalization capability of robot learning methods by maximizing the exploration based on game-theoretic objectives while ensuring safety based on collaborative task spaces. In this paper, we will discuss the potential benefits of this setup as well as the challenges and research directions that can be pursued.

ROOct 8, 2021
Learning to Centralize Dual-Arm Assembly

Marvin Alles, Elie Aljalbout

Robotic manipulators are widely used in modern manufacturing processes. However, their deployment in unstructured environments remains an open problem. To deal with the variety, complexity, and uncertainty of real-world manipulation tasks, it is essential to develop a flexible framework with reduced assumptions on the environment characteristics. In recent years, reinforcement learning (RL) has shown great results for single-arm robotic manipulation. However, research focusing on dual-arm manipulation is still rare. From a classical control perspective, solving such tasks often involves complex modeling of interactions between two manipulators and the objects encountered in the tasks, as well as the two robots coupling at a control level. Instead, in this work, we explore the applicability of model-free RL to dual-arm assembly. As we aim to contribute towards an approach that is not limited to dual-arm assembly, but dual-arm manipulation in general, we keep modeling efforts at a minimum. Hence, to avoid modeling the interaction between the two robots and the used assembly tools, we present a modular approach with two decentralized single-arm controllers which are coupled using a single centralized learned policy. We reduce modeling effort to a minimum by using sparse rewards only. Our architecture enables successful assembly and simple transfer from simulation to the real world. We demonstrate the effectiveness of the framework on dual-arm peg-in-hole and analyze sample efficiency and success rates for different action spaces. Moreover, we compare results on different clearances and showcase disturbance recovery and robustness, when dealing with position uncertainties. Finally we zero-shot transfer policies trained in simulation to the real world and evaluate their performance.

LGOct 2, 2021
Seeking Visual Discomfort: Curiosity-driven Representations for Reinforcement Learning

Elie Aljalbout, Maximilian Ulmer, Rudolph Triebel

Vision-based reinforcement learning (RL) is a promising approach to solve control tasks involving images as the main observation. State-of-the-art RL algorithms still struggle in terms of sample efficiency, especially when using image observations. This has led to increased attention on integrating state representation learning (SRL) techniques into the RL pipeline. Work in this field demonstrates a substantial improvement in sample efficiency among other benefits. However, to take full advantage of this paradigm, the quality of samples used for training plays a crucial role. More importantly, the diversity of these samples could affect the sample efficiency of vision-based RL, but also its generalization capability. In this work, we present an approach to improve sample diversity for state representation learning. Our method enhances the exploration capability of RL algorithms, by taking advantage of the SRL setup. Our experiments show that our proposed approach boosts the visitation of problematic states, improves the learned state representation, and outperforms the baselines for all tested environments. These results are most apparent for environments where the baseline methods struggle. Even in simple environments, our method stabilizes the training, reduces the reward variance, and promotes sample efficiency.

LGSep 28, 2021
Making Curiosity Explicit in Vision-based RL

Elie Aljalbout, Maximilian Ulmer, Rudolph Triebel

Vision-based reinforcement learning (RL) is a promising technique to solve control tasks involving images as the main observation. State-of-the-art RL algorithms still struggle in terms of sample efficiency, especially when using image observations. This has led to an increased attention on integrating state representation learning (SRL) techniques into the RL pipeline. Work in this field demonstrates a substantial improvement in sample efficiency among other benefits. However, to take full advantage of this paradigm, the quality of samples used for training plays a crucial role. More importantly, the diversity of these samples could affect the sample efficiency of vision-based RL, but also its generalization capability. In this work, we present an approach to improve the sample diversity. Our method enhances the exploration capability of the RL algorithms by taking advantage of the SRL setup. Our experiments show that the presented approach outperforms the baseline for all tested environments. These results are most apparent for environments where the baseline method struggles. Even in simple environments, our method stabilizes the training, reduces the reward variance and boosts sample efficiency.

ROOct 30, 2020
Learning Vision-based Reactive Policies for Obstacle Avoidance

Elie Aljalbout, Ji Chen, Konstantin Ritt et al.

In this paper, we address the problem of vision-based obstacle avoidance for robotic manipulators. This topic poses challenges for both perception and motion generation. While most work in the field aims at improving one of those aspects, we provide a unified framework for approaching this problem. The main goal of this framework is to connect perception and motion by identifying the relationship between the visual input and the corresponding motion representation. To this end, we propose a method for learning reactive obstacle avoidance policies. We evaluate our method on goal-reaching tasks for single and multiple obstacles scenarios. We show the ability of the proposed method to efficiently learn stable obstacle avoidance strategies at a high success rate, while maintaining closed-loop responsiveness required for critical applications like human-robot interaction.

LGOct 25, 2020
How to Make Deep RL Work in Practice

Nirnai Rao, Elie Aljalbout, Axel Sauer et al.

In recent years, challenging control problems became solvable with deep reinforcement learning (RL). To be able to use RL for large-scale real-world applications, a certain degree of reliability in their performance is necessary. Reported results of state-of-the-art algorithms are often difficult to reproduce. One reason for this is that certain implementation details influence the performance significantly. Commonly, these details are not highlighted as important techniques to achieve state-of-the-art performance. Additionally, techniques from supervised learning are often used by default but influence the algorithms in a reinforcement learning setting in different and not well-understood ways. In this paper, we investigate the influence of certain initialization, input normalization, and adaptive learning techniques on the performance of state-of-the-art RL algorithms. We make suggestions which of those techniques to use by default and highlight areas that could benefit from a solution specifically tailored to RL.

NEMar 17, 2020
Task-Independent Spiking Central Pattern Generator: A Learning-Based Approach

Elie Aljalbout, Florian Walter, Florian Röhrbein et al.

Legged locomotion is a challenging task in the field of robotics but a rather simple one in nature. This motivates the use of biological methodologies as solutions to this problem. Central pattern generators are neural networks that are thought to be responsible for locomotion in humans and some animal species. As for robotics, many attempts were made to reproduce such systems and use them for a similar goal. One interesting design model is based on spiking neural networks. This model is the main focus of this work, as its contribution is not limited to engineering but also applicable to neuroscience. This paper introduces a new general framework for building central pattern generators that are task-independent, biologically plausible, and rely on learning methods. The abilities and properties of the presented approach are not only evaluated in simulation but also in a robotic experiment. The results are very promising as the used robot was able to perform stable walking at different speeds and to change speed within the same gait cycle.

CVJul 21, 2019
Tracking Holistic Object Representations

Axel Sauer, Elie Aljalbout, Sami Haddadin

Recent advances in visual tracking are based on siamese feature extractors and template matching. For this category of trackers, latest research focuses on better feature embeddings and similarity measures. In this work, we focus on building holistic object representations for tracking. We propose a framework that is designed to be used on top of previous trackers without any need for further training of the siamese network. The framework leverages the idea of obtaining additional object templates during the tracking process. Since the number of stored templates is limited, our method only keeps the most diverse ones. We achieve this by providing a new diversity measure in the space of siamese features. The obtained representation contains information beyond the ground truth object location provided to the system. It is then useful for tracking itself but also for further tasks which require a visual understanding of objects. Strong empirical results on tracking benchmarks indicate that our method can improve the performance and robustness of the underlying trackers while barely reducing their speed. In addition, our method is able to match current state-of-the-art results, while using a simpler and older network architecture and running three times faster.

LGJan 23, 2018
Clustering with Deep Learning: Taxonomy and New Methods

Elie Aljalbout, Vladimir Golkov, Yawar Siddiqui et al.

Clustering methods based on deep neural networks have proven promising for clustering real-world data because of their high representational power. In this paper, we propose a systematic taxonomy of clustering methods that utilize deep neural networks. We base our taxonomy on a comprehensive review of recent work and validate the taxonomy in a case study. In this case study, we show that the taxonomy enables researchers and practitioners to systematically create new clustering methods by selectively recombining and replacing distinct aspects of previous methods with the goal of overcoming their individual limitations. The experimental evaluation confirms this and shows that the method created for the case study achieves state-of-the-art clustering quality and surpasses it in some cases.