6.7ROMay 26
SteelDS: A High-Resolution Video Dataset of E40 Steel Scrap for Object Detection and Instance SegmentationMelanie Neubauer, Christian Rauch, Gerald Koinig et al.
This dataset provides high-resolution, annotated video sequences of shredded E40-grade steel and copper scrap on a conveyor belt. Captured in a controlled laboratory environment, the data reflects the industrial post-magnetic sorting stage, where manual intervention is typically required to remove copper contaminants. The dataset comprises 24,297 labeled frames across five subsets, featuring 396 steel and 101 copper objects categorized by size. It supports the development of machine learning models for material classification, object detection, and instance segmentation. Variations in object spacing and density are included to simulate realistic industrial sorting conditions. Ground truth annotations include pixel-wise segmentation masks and material classes. This dataset serves as a benchmark for evaluating automated sorting algorithms aiming to identify copper impurities within complex, heterogeneous steel scrap streams.
LGSep 6, 2023
CR-VAE: Contrastive Regularization on Variational Autoencoders for Preventing Posterior CollapseFotios Lygerakis, Elmar Rueckert
The Variational Autoencoder (VAE) is known to suffer from the phenomenon of \textit{posterior collapse}, where the latent representations generated by the model become independent of the inputs. This leads to degenerated representations of the input, which is attributed to the limitations of the VAE's objective function. In this work, we propose a novel solution to this issue, the Contrastive Regularization for Variational Autoencoders (CR-VAE). The core of our approach is to augment the original VAE with a contrastive objective that maximizes the mutual information between the representations of similar visual inputs. This strategy ensures that the information flow between the input and its latent representation is maximized, effectively avoiding posterior collapse. We evaluate our method on a series of visual datasets and demonstrate, that CR-VAE outperforms state-of-the-art approaches in preventing posterior collapse.
LGJul 9, 2024
ED-VAE: Entropy Decomposition of ELBO in Variational AutoencodersFotios Lygerakis, Elmar Rueckert
Traditional Variational Autoencoders (VAEs) are constrained by the limitations of the Evidence Lower Bound (ELBO) formulation, particularly when utilizing simplistic, non-analytic, or unknown prior distributions. These limitations inhibit the VAE's ability to generate high-quality samples and provide clear, interpretable latent representations. This work introduces the Entropy Decomposed Variational Autoencoder (ED-VAE), a novel re-formulation of the ELBO that explicitly includes entropy and cross-entropy components. This reformulation significantly enhances model flexibility, allowing for the integration of complex and non-standard priors. By providing more detailed control over the encoding and regularization of latent spaces, ED-VAE not only improves interpretability but also effectively captures the complex interactions between latent variables and observed data, thus leading to better generative performance.
12.6CVApr 7
PASTA: Vision Transformer Patch Aggregation for Weakly Supervised Target and Anomaly SegmentationMelanie Neubauer, Elmar Rueckert, Christian Rauch
Detecting unseen anomalies in unstructured environments presents a critical challenge for industrial and agricultural applications such as material recycling and weeding. Existing perception systems frequently fail to satisfy the strict operational requirements of these domains, specifically real-time processing, pixel-level segmentation precision, and robust accuracy, due to their reliance on exhaustively annotated datasets. To address these limitations, we propose a weakly supervised pipeline for object segmentation and classification using weak image-level supervision called 'Patch Aggregation for Segmentation of Targets and Anomalies' (PASTA). By comparing an observed scene with a nominal reference, PASTA identifies Target and Anomaly objects through distribution analysis in self-supervised Vision Transformer (ViT) feature spaces. Our pipeline utilizes semantic text-prompts via the Segment Anything Model 3 to guide zero-shot object segmentation. Evaluations on a custom steel scrap recycling dataset and a plant dataset demonstrate a 75.8% training time reduction of our approach to domain-specific baselines. While being domain-agnostic, our method achieves superior Target (up to 88.3% IoU) and Anomaly (up to 63.5% IoU) segmentation performance in the industrial and agricultural domain.
30.5ROMar 15
SIL: Symbiotic Interactive Learning for Language-Conditioned Human-Agent Co-AdaptationLinus Nwankwo, Bjoern Ellensohn, Christian Rauch et al.
Today's autonomous agents, largely driven by foundation models (FMs), can understand natural language instructions and solve long-horizon tasks with human-like reasoning. However, current human-robot interaction largely follows a one-way master-apprentice technique where the agent passively executes commands without reciprocal learning. This neglects the co-adaptive, multi-turn nature of everyday human interactions. We introduce symbiotic interactive learning (SIL), a bidirectional co-adaptation framework in a shared latent task space, where human and agent maintain joint belief states that evolve with interaction history. This enables proactive clarification, adaptive suggestions, and shared plan refinement. SIL leverages FMs for spatial perception and reasoning, together with a triplet-loss-trained neural encoder that grounds FMs' outputs into task-specific latent representations. To support long-term stability as tasks evolve, SIL uses episodic and semantic memory architectures, regularised via elastic weight consolidation to mitigate catastrophic forgetting. We evaluate SIL on simulated and real-world embodied tasks, including instruction following, information retrieval, query-oriented reasoning, and interactive dialogue, achieving a $90.4\%$ task completion rate and a belief alignment score of $Ï\approx 0.83$, an absolute improvement of about $20$ percentage points over the best ablations. Demos and resources: https://linusnep.github.io/SIL/.
RONov 5, 2020Code
ROS-Mobile: An Android application for the Robot Operating SystemNils Rottmann, Nico Studt, Floris Ernst et al.
Controlling and monitoring complex autonomous and semi autonomous robotic systems is a challenging task. The Robot Operating System (ROS) was developed to act as a robotic middleware system running on Ubuntu Linux which allows, amongst others, hardware abstraction, message-passing between individual processes and package management. However, active support of ROS applications for mobile devices, such as smarthphones or tablets, are missing. We developed a ROS application for Android, which comes with an intuitive user interface for controlling and monitoring robotic systems. Our open source contribution can be used in a large variety of tasks and with many different kinds of robots. Moreover, it can easily be customized and new features added. In this paper, we give an outline over the software architecture, the main functionalities and show some possible use-cases on different mobile robotic systems.
ROOct 12, 2015Code
Low-cost Sensor Glove with Force Feedback for Learning from Demonstrations using Probabilistic Trajectory RepresentationsElmar Rueckert, Rudolf Lioutikov, Roberto Calandra et al.
Sensor gloves are popular input devices for a large variety of applications including health monitoring, control of music instruments, learning sign language, dexterous computer interfaces, and tele-operating robot hands. Many commercial products as well as low-cost open source projects have been developed. We discuss here how low-cost (approx. 250 EUROs) sensor gloves with force feedback can be build, provide an open source software interface for Matlab and present first results in learning object manipulation skills through imitation learning on the humanoid robot iCub.
ROJan 22, 2024
Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-TrainingVedant Dave, Fotios Lygerakis, Elmar Rueckert
The rapidly evolving field of robotics necessitates methods that can facilitate the fusion of multiple modalities. Specifically, when it comes to interacting with tangible objects, effectively combining visual and tactile sensory data is key to understanding and navigating the complex dynamics of the physical world, enabling a more nuanced and adaptable response to changing environments. Nevertheless, much of the earlier work in merging these two sensory modalities has relied on supervised methods utilizing datasets labeled by humans.This paper introduces MViTac, a novel methodology that leverages contrastive learning to integrate vision and touch sensations in a self-supervised fashion. By availing both sensory inputs, MViTac leverages intra and inter-modality losses for learning representations, resulting in enhanced material property classification and more adept grasping prediction. Through a series of experiments, we showcase the effectiveness of our method and its superiority over existing state-of-the-art self-supervised and supervised techniques. In evaluating our methodology, we focus on two distinct tasks: material classification and grasping success prediction. Our results indicate that MViTac facilitates the development of improved modality encoders, yielding more robust representations as evidenced by linear probing assessments.
ROJan 30, 2024
M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic ManipulationFotios Lygerakis, Vedant Dave, Elmar Rueckert
One of the most critical aspects of multimodal Reinforcement Learning (RL) is the effective integration of different observation modalities. Having robust and accurate representations derived from these modalities is key to enhancing the robustness and sample efficiency of RL algorithms. However, learning representations in RL settings for visuotactile data poses significant challenges, particularly due to the high dimensionality of the data and the complexity involved in correlating visual and tactile inputs with the dynamic environment and task objectives. To address these challenges, we propose Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL). Our approach employs a novel multimodal self-supervised learning technique that learns efficient representations and contributes to faster convergence of RL algorithms. Our method is agnostic to the RL algorithm, thus enabling its integration with any available RL algorithm. We evaluate M2CURL on the Tactile Gym 2 simulator and we show that it significantly enhances the learning efficiency in different manipulation tasks. This is evidenced by faster convergence rates and higher cumulative rewards per episode, compared to standard RL algorithms without our representation learning approach.
ROJan 23, 2024
Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected ImprovementNikolaus Feith, Elmar Rueckert
Interactive Machine Learning (IML) seeks to integrate human expertise into machine learning processes. However, most existing algorithms cannot be applied to Realworld Scenarios because their state spaces and/or action spaces are limited to discrete values. Furthermore, the interaction of all existing methods is restricted to deciding between multiple proposals. We therefore propose a novel framework based on Bayesian Optimization (BO). Interactive Bayesian Optimization (IBO) enables collaboration between machine learning algorithms and humans. This framework captures user preferences and provides an interface for users to shape the strategy by hand. Additionally, we've incorporated a new acquisition function, Preference Expected Improvement (PEI), to refine the system's efficiency using a probabilistic model of the user preferences. Our approach is geared towards ensuring that machines can benefit from human expertise, aiming for a more aligned and effective learning process. In the course of this work, we applied our method to simulations and in a real world task using a Franka Panda robot to show human-robot collaboration.
ROFeb 23, 2022
Using Deep Reinforcement Learning with Automatic Curriculum Learning for Mapless Navigation in IntralogisticsHonghu Xue, Benedikt Hein, Mohamed Bakr et al.
We propose a deep reinforcement learning approach for solving a mapless navigation problem in warehouse scenarios. In our approach, an automation guided vehicle is equipped with LiDAR and frontal RGB sensors and learns to perform a targeted navigation task. The challenges reside in the sparseness of positive samples for learning, multi-modal sensor perception with partial observability, the demand for accurate steering maneuvers together with long training cycles. To address these points, we propose NavACL-Q as a method for automatic curriculum learning in combination with a distributed version of the soft actor-critic algorithm. The performance of the learning algorithm is evaluated exhaustively in an unseen warehouse environment to validate both robustness and generalizability of the learned policy. Results in NVIDIA Isaac Sim demonstrates that our trained agent significantly outperforms a map-based navigation pipeline provided by NVIDIA Isaac Sim with an increased agent-goal distance of 3m and wider initial relative agent-goal rotations of 45 degree. The ablation studies also suggests that NavACL-Q greatly facilitates the learning process with a performance gain of roughly 40% compared to training with random starts and that the utilization of a pre-trained feature extractor manifestly boosts the performance by approximately 60%.
ROJul 5, 2021
Using Probabilistic Movement Primitives in Analyzing Human Motion Difference under Transcranial Current StimulationHonghu Xue, Rebecca Herzog, Till M Berger et al.
In medical tasks such as human motion analysis, computer-aided auxiliary systems have become preferred choice for human experts for its high efficiency. However, conventional approaches are typically based on user-defined features such as movement onset times, peak velocities, motion vectors or frequency domain analyses. Such approaches entail careful data post-processing or specific domain knowledge to achieve a meaningful feature extraction. Besides, they are prone to noise and the manual-defined features could hardly be re-used for other analyses. In this paper, we proposed probabilistic movement primitives (ProMPs), a widely-used approach in robot skill learning, to model human motions. The benefit of ProMPs is that the features are directly learned from the data and ProMPs can capture important features describing the trajectory shape, which can easily be extended to other tasks. Distinct from previous research, where classification tasks are mostly investigated, we applied ProMPs together with a variant of Kullback-Leibler (KL) divergence to quantify the effect of different transcranial current stimulation methods on human motions. We presented an initial result with 10 participants. The results validate ProMPs as a robust and effective feature extractor for human motions.
NEMay 17, 2021
Evolutionary Training and Abstraction Yields Algorithmic Generalization of Neural ComputersDaniel Tanneberg, Elmar Rueckert, Jan Peters
A key feature of intelligent behaviour is the ability to learn abstract strategies that scale and transfer to unfamiliar problems. An abstract strategy solves every sample from a problem class, no matter its representation or complexity -- like algorithms in computer science. Neural networks are powerful models for processing sensory data, discovering hidden patterns, and learning complex functions, but they struggle to learn such iterative, sequential or hierarchical algorithmic strategies. Extending neural networks with external memories has increased their capacities in learning such strategies, but they are still prone to data variations, struggle to learn scalable and transferable solutions, and require massive training data. We present the Neural Harvard Computer (NHC), a memory-augmented network based architecture, that employs abstraction by decoupling algorithmic operations from data manipulations, realized by splitting the information flow and separated modules. This abstraction mechanism and evolutionary training enable the learning of robust and scalable algorithmic solutions. On a diverse set of 11 algorithms with varying complexities, we show that the NHC reliably learns algorithmic solutions with strong generalization and abstraction: perfect generalization and scaling to arbitrary task configurations and complexities far beyond seen during training, and being independent of the data representation and the task domain.
LGMar 26, 2021
SKID RAW: Skill Discovery from Raw TrajectoriesDaniel Tanneberg, Kai Ploeger, Elmar Rueckert et al.
Integrating robots in complex everyday environments requires a multitude of problems to be solved. One crucial feature among those is to equip robots with a mechanism for teaching them a new task in an easy and natural way. When teaching tasks that involve sequences of different skills, with varying order and number of these skills, it is desirable to only demonstrate full task executions instead of all individual skills. For this purpose, we propose a novel approach that simultaneously learns to segment trajectories into reoccurring patterns and the skills to reconstruct these patterns from unlabelled demonstrations without further supervision. Moreover, the approach learns a skill conditioning that can be used to understand possible sequences of skills, a practical mechanism to be used in, for example, human-robot-interactions for a more intelligent and adaptive robot behaviour. The Bayesian and variational inference based approach is evaluated on synthetic and real human demonstrations with varying complexities and dimensionality, showing the successful learning of segmentations and skill libraries from unlabelled data.
RONov 12, 2020
Parameter Optimization for Loop Closure Detection in Closed EnvironmentsNils Rottmann, Ralf Bruder, Honghu Xue et al.
Tuning parameters is crucial for the performance of localization and mapping algorithms. In general, the tuning of the parameters requires expert knowledge and is sensitive to information about the structure of the environment. In order to design truly autonomous systems the robot has to learn the parameters automatically. Therefore, we propose a parameter optimization approach for loop closure detection in closed environments which requires neither any prior information, e.g. robot model parameters, nor expert knowledge. It relies on several path traversals along the boundary line of the closed environment. We demonstrate the performance of our method in challenging real world scenarios with limited sensing capabilities. These scenarios are exemplary for a wide range of practical applications including lawn mowers and household robots.
NEOct 30, 2019
Learning Algorithmic Solutions to Symbolic Planning Tasks with a Neural Computer ArchitectureDaniel Tanneberg, Elmar Rueckert, Jan Peters
A key feature of intelligent behavior is the ability to learn abstract strategies that transfer to unfamiliar problems. Therefore, we present a novel architecture, based on memory-augmented networks, that is inspired by the von Neumann and Harvard architectures of modern computers. This architecture enables the learning of abstract algorithmic solutions via Evolution Strategies in a reinforcement learning setting. Applied to Sokoban, sliding block puzzle and robotic manipulation tasks, we show that the architecture can learn algorithmic solutions with strong generalization and abstraction: scaling to arbitrary task configurations and complexities, and being independent of both the data representation and the task domain.
ROAug 13, 2019
Cataglyphis ant navigation strategies solve the global localization problem in robots with binary sensorsNils Rottmann, Ralf Bruder, Achim Schweikard et al.
Low cost robots, such as vacuum cleaners or lawn mowers, employ simplistic and often random navigation policies. Although a large number of sophisticated localization and planning approaches exist, they require additional sensors like LIDAR sensors, cameras or time of flight sensors. In this work, we propose a global localization method biologically inspired by simple insects, such as the ant Cataglyphis that is able to return from distant locations to its nest in the desert without any or with limited perceptual cues. Like in Cataglyphis, the underlying idea of our localization approach is to first compute a pose estimate from pro-prioceptual sensors only, using land navigation, and thereafter refine the estimate through a systematic search in a particle filter that integrates the rare visual feedback. In simulation experiments in multiple environments, we demonstrated that this bioinspired principle can be used to compute accurate pose estimates from binary visual cues only. Such intelligent localization strategies can improve the performance of any robot with limited sensing capabilities such as household robots or toys.
ROAug 13, 2019
Loop Closure Detection in Closed EnvironmentsNils Rottmann, Ralf Bruder, Achim Schweikard et al.
Low cost robots, such as vacuum cleaners or lawn mowers employ simplistic and often random navigation policies. Although a large number of sophisticated mapping and planning approaches exist, they require additional sensors like LIDAR sensors, cameras or time of flight sensors. In this work, we propose a loop closure detection method based only on odometry data which can be generated using low-range or binary signal sensors together with simple wall following techniques. We show how to include the detected loop closing constraints into a pose graph formulation such that standard pose graph optimization techniques can be used for map estimation. We evaluate our map estimate and loop closure approach using both, simulation and a real lawn mower in complex and realistic environments. Our results demonstrate that our approach generates accurate map estimates on the basis of odometry data only. We further show that our assumption about the discriminative nature of neighboring poses in the pose graph is solid, even under large odometry noise. These improved map estimates provide the basis for smart navigation policies in low cost robots and extends their abilities to goal-directed behavior like pick and place or complete coverage path planning in realistic environments.
LGAug 11, 2019
Experience Reuse with Probabilistic Movement PrimitivesSvenja Stark, Jan Peters, Elmar Rueckert
Acquiring new robot motor skills is cumbersome, as learning a skill from scratch and without prior knowledge requires the exploration of a large space of motor configurations. Accordingly, for learning a new task, time could be saved by restricting the parameter search space by initializing it with the solution of a similar task. We present a framework which is able of such knowledge transfer from already learned movement skills to a new learning task. The framework combines probabilistic movement primitives with descriptions of their effects for skill representation. New skills are first initialized with parameters inferred from related movement primitives and thereafter adapted to the new task through relative entropy policy search. We compare two different transfer approaches to initialize the search space distribution with data of known skills with a similar effect. We show the different benefits of the two knowledge transfer approaches on an object pushing task for a simulated 3-DOF robot. We can show that the quality of the learned skills improves and the required iterations to learn a new task can be reduced by more than 60% when past experiences are utilized.
LGApr 28, 2019
Learning walk and trot from the same objective using different types of explorationZinan Liu, Kai Ploeger, Svenja Stark et al.
In quadruped gait learning, policy search methods that scale high dimensional continuous action spaces are commonly used. In most approaches, it is necessary to introduce prior knowledge on the gaits to limit the highly non-convex search space of the policies. In this work, we propose a new approach to encode the symmetry properties of the desired gaits, on the initial covariance of the Gaussian search distribution, allowing for strategic exploration. Using episode-based likelihood ratio policy gradient and relative entropy policy search, we learned the gaits walk and trot on a simulated quadruped. Comparing these gaits to random gaits learned by initialized diagonal covariance matrix, we show that the performance can be significantly enhanced.
LGMar 1, 2018
Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal ModelingAdrian Šošić, Elmar Rueckert, Jan Peters et al.
Advances in the field of inverse reinforcement learning (IRL) have led to sophisticated inference frameworks that relax the original modeling assumption of observing an agent behavior that reflects only a single intention. Instead of learning a global behavioral model, recent IRL methods divide the demonstration data into parts, to account for the fact that different trajectories may correspond to different intentions, e.g., because they were generated by different domain experts. In this work, we go one step further: using the intuitive concept of subgoals, we build upon the premise that even a single trajectory can be explained more efficiently locally within a certain context than globally, enabling a more compact representation of the observed behavior. Based on this assumption, we build an implicit intentional model of the agent's goals to forecast its behavior in unobserved situations. The result is an integrated Bayesian prediction framework that significantly outperforms existing IRL solutions and provides smooth policy estimates consistent with the expert's plan. Most notably, our framework naturally handles situations where the intentions of the agent change over time and classical IRL algorithms fail. In addition, due to its probabilistic nature, the model can be straightforwardly applied in active learning scenarios to guide the demonstration process of the expert.
AIFeb 22, 2018
Intrinsic Motivation and Mental Replay enable Efficient Online Adaptation in Stochastic Recurrent NetworksDaniel Tanneberg, Jan Peters, Elmar Rueckert
Autonomous robots need to interact with unknown, unstructured and changing environments, constantly facing novel challenges. Therefore, continuous online adaptation for lifelong-learning and the need of sample-efficient mechanisms to adapt to changes in the environment, the constraints, the tasks, or the robot itself are crucial. In this work, we propose a novel framework for probabilistic online motion planning with online adaptation based on a bio-inspired stochastic recurrent neural network. By using learning signals which mimic the intrinsic motivation signalcognitive dissonance in addition with a mental replay strategy to intensify experiences, the stochastic recurrent network can learn from few physical interactions and adapts to novel environments in seconds. We evaluate our online planning and adaptation framework on an anthropomorphic KUKA LWR arm. The rapid online adaptation is shown by learning unknown workspace constraints sample-efficiently from few physical interactions while following given way points.