Achim J. Lilienthal

RO
h-index52
23papers
286citations
Novelty43%
AI Score50

23 Papers

ROMay 26
Towards Drone-based Mapping of Volcanic Gases using Gas Tomography

Marius Schaab, Niklas Karbach, Antonia Rabe et al.

Volcanoes emit large amounts of CO2, directly influencing human lives. Mapping volcanic gas emissions helps to forecast eruptions and understand the impact of volcanoes on climate and the environment. Drone-based gas sensing significantly reduces risks in volcanic monitoring but faces technical limitations when measuring gas, as rotor downwash disperses the gas plume before detection. Gas Tomography using remote gas sensing addresses this challenge. At the Salinelle dei Cappuccini mud volcanoes, we demonstrate that while drone-mounted in-situ sensors failed to detect CO2 emissions due to aerodynamic disturbance, open-path sensing successfully enabled remote gas distribution mapping. We present a novel model-based gas tomographic reconstruction approach that incorporates a Lagrangian model to compensate for wind-induced advection. The resulting gas distribution maps align with manually collected in-situ measurements, confirming that model-based gas tomography effectively overcomes downwash limitations and enables accurate mapping of volcanic emissions.

HCMay 21
Perceived Safety of Workers in Encounters with Large Industrial AGVs

Ansgar Howey, Tim Schreiter, Andrey Rudenko et al.

Automated Guided Vehicles (AGV) in factory automation are increasingly capable of moving autonomously in close proximity to human workers. While their physical safety is regulated by standards and directives, perceived safety and workers comfort in close-proximity interactions are being actively investigated in studies. There are three limitations in the prior art research to that end. Firstly, AGVs with larger payloads are understudied. Secondly, the test participants are usually students and not working professionals. Thirdly, while conducting in-person experiments with heavy machinery can be dangerous, the transfer of safety perception results from simulated experiments remains open. In this paper, we investigate industrial workers perceived safety in shared spaces with large AGVs in a real-world encounter and in virtual reality. We vary the passing distance and the shape of the collision avoidance maneuver, and evaluate perceived threat level using a handheld pressure-sensitive trigger interface and a post-experiment questionnaire. Additionally, we ask participants to set their own collision avoidance parameters based on their experience with the demonstrated trajectory profiles. In a within-subject study, we found that, while the threat levels are perceived overall slightly higher in VR, the passing distance of 1.5 to 2 meters is preferred among the demonstrated profiles, as well as in the self-defined trajectories.

ROOct 4, 2025
Trajectory prediction for heterogeneous agents: A performance analysis on small and imbalanced datasets

Tiago Rodrigues de Almeida, Yufei Zhu, Andrey Rudenko et al.

Robots and other intelligent systems navigating in complex dynamic environments should predict future actions and intentions of surrounding agents to reach their goals efficiently and avoid collisions. The dynamics of those agents strongly depends on their tasks, roles, or observable labels. Class-conditioned motion prediction is thus an appealing way to reduce forecast uncertainty and get more accurate predictions for heterogeneous agents. However, this is hardly explored in the prior art, especially for mobile robots and in limited data applications. In this paper, we analyse different class-conditioned trajectory prediction methods on two datasets. We propose a set of conditional pattern-based and efficient deep learning-based baselines, and evaluate their performance on robotics and outdoors datasets (THÖR-MAGNI and Stanford Drone Dataset). Our experiments show that all methods improve accuracy in most of the settings when considering class labels. More importantly, we observe that there are significant differences when learning from imbalanced datasets, or in new environments where sufficient data is not available. In particular, we find that deep learning methods perform better on balanced datasets, but in applications with limited data, e.g., cold start of a robot in a new environment, or imbalanced classes, pattern-based methods may be preferable.

RODec 18, 2024
THÖR-MAGNI Act: Actions for Human Motion Modeling in Robot-Shared Industrial Spaces

Tiago Rodrigues de Almeida, Tim Schreiter, Andrey Rudenko et al.

Accurate human activity and trajectory prediction are crucial for ensuring safe and reliable human-robot interactions in dynamic environments, such as industrial settings, with mobile robots. Datasets with fine-grained action labels for moving people in industrial environments with mobile robots are scarce, as most existing datasets focus on social navigation in public spaces. This paper introduces the THÖR-MAGNI Act dataset, a substantial extension of the THÖR-MAGNI dataset, which captures participant movements alongside robots in diverse semantic and spatial contexts. THÖR-MAGNI Act provides 8.3 hours of manually labeled participant actions derived from egocentric videos recorded via eye-tracking glasses. These actions, aligned with the provided THÖR-MAGNI motion cues, follow a long-tailed distribution with diversified acceleration, velocity, and navigation distance profiles. We demonstrate the utility of THÖR-MAGNI Act for two tasks: action-conditioned trajectory prediction and joint action and trajectory prediction. We propose two efficient transformer-based models that outperform the baselines to address these tasks. These results underscore the potential of THÖR-MAGNI Act to develop predictive models for enhanced human-robot interaction in complex environments.

ROMar 6
CFEAR-Teach-and-Repeat: Fast and Accurate Radar-only Localization

Maximilian Hilger, Daniel Adolfsson, Ralf Becker et al.

Reliable localization in prior maps is essential for autonomous navigation, particularly under adverse weather, where optical sensors may fail. We present CFEAR-TR, a teach-and-repeat localization pipeline using a single spinning radar, which is designed for easily deployable, lightweight, and robust navigation in adverse conditions. Our method localizes by jointly aligning live scans to both stored scans from the teach mapping pass, and to a sliding window of recent live keyframes. This ensures accurate and robust pose estimation across different seasons and weather phenomena. Radar scans are represented using a sparse set of oriented surface points, computed from Doppler-compensated measurements. The map is stored in a pose graph that is traversed during localization. Experiments on the held-out test sequences from the Boreas dataset show that CFEAR-TR can localize with an accuracy as low as 0.117 m and 0.096°, corresponding to improvements of up to 63% over the previous state of the art, while running efficiently at 29 Hz. These results substantially narrow the gap to lidar-level localization, particularly in heading estimation. We make the C++ implementation of our work available to the community.

ROMay 9, 2025
Collecting Human Motion Data in Large and Occlusion-Prone Environments using Ultra-Wideband Localization

Janik Kaden, Maximilian Hilger, Tim Schreiter et al.

With robots increasingly integrating into human environments, understanding and predicting human motion is essential for safe and efficient interactions. Modern human motion and activity prediction approaches require high quality and quantity of data for training and evaluation, usually collected from motion capture systems, onboard or stationary sensors. Setting up these systems is challenging due to the intricate setup of hardware components, extensive calibration procedures, occlusions, and substantial costs. These constraints make deploying such systems in new and large environments difficult and limit their usability for in-the-wild measurements. In this paper we investigate the possibility to apply the novel Ultra-Wideband (UWB) localization technology as a scalable alternative for human motion capture in crowded and occlusion-prone environments. We include additional sensing modalities such as eye-tracking, onboard robot LiDAR and radar sensors, and record motion capture data as ground truth for evaluation and comparison. The environment imitates a museum setup, with up to four active participants navigating toward random goals in a natural way, and offers more than 130 minutes of multi-modal data. Our investigation provides a step toward scalable and accurate motion data collection beyond vision-based systems, laying a foundation for evaluating sensing modalities like UWB in larger and complex environments like warehouses, airports, or convention centers.

CVApr 29, 2025
Large-scale visual SLAM for in-the-wild videos

Shuo Sun, Torsten Sattler, Malcolm Mielle et al.

Accurate and robust 3D scene reconstruction from casual, in-the-wild videos can significantly simplify robot deployment to new environments. However, reliable camera pose estimation and scene reconstruction from such unconstrained videos remains an open challenge. Existing visual-only SLAM methods perform well on benchmark datasets but struggle with real-world footage which often exhibits uncontrolled motion including rapid rotations and pure forward movements, textureless regions, and dynamic objects. We analyze the limitations of current methods and introduce a robust pipeline designed to improve 3D reconstruction from casual videos. We build upon recent deep visual odometry methods but increase robustness in several ways. Camera intrinsics are automatically recovered from the first few frames using structure-from-motion. Dynamic objects and less-constrained areas are masked with a predictive model. Additionally, we leverage monocular depth estimates to regularize bundle adjustment, mitigating errors in low-parallax situations. Finally, we integrate place recognition and loop closure to reduce long-term drift and refine both intrinsics and pose estimates through global bundle adjustment. We demonstrate large-scale contiguous 3D models from several online videos in various environments. In contrast, baseline methods typically produce locally inconsistent results at several points, producing separate segments or distorted maps. In lieu of ground-truth pose data, we evaluate map consistency, execution time and visual accuracy of re-rendered NeRF models. Our proposed system establishes a new baseline for visual reconstruction from casual uncontrolled videos found online, demonstrating more consistent reconstructions over longer sequences of in-the-wild videos than previously achieved.

ROMar 19, 2024
High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization

Shuo Sun, Malcolm Mielle, Achim J. Lilienthal et al.

We propose a dense RGBD SLAM system based on 3D Gaussian Splatting that provides metrically accurate pose tracking and visually realistic reconstruction. To this end, we first propose a Gaussian densification strategy based on the rendering loss to map unobserved areas and refine reobserved areas. Second, we introduce extra regularization parameters to alleviate the forgetting problem in the continuous mapping problem, where parameters tend to overfit the latest frame and result in decreasing rendering quality for previous frames. Both mapping and tracking are performed with Gaussian parameters by minimizing re-rendering loss in a differentiable way. Compared to recent neural and concurrently developed gaussian splatting RGBD SLAM baselines, our method achieves state-of-the-art results on the synthetic dataset Replica and competitive results on the real-world dataset TUM.

ROSep 21, 2021
Oriented surface points for efficient and accurate radar odometry

Daniel Adolfsson, Martin Magnusson, Anas Alhashimi et al.

This paper presents an efficient and accurate radar odometry pipeline for large-scale localization. We propose a radar filter that keeps only the strongest reflections per-azimuth that exceeds the expected noise level. The filtered radar data is used to incrementally estimate odometry by registering the current scan with a nearby keyframe. By modeling local surfaces, we were able to register scans by minimizing a point-to-line metric and accurately estimate odometry from sparse point sets, hence improving efficiency. Specifically, we found that a point-to-line metric yields significant improvements compared to a point-to-point metric when matching sparse sets of surface points. Preliminary results from an urban odometry benchmark show that our odometry pipeline is accurate and efficient compared to existing methods with an overall translation error of 2.05%, down from 2.78% from the previously best published method, running at 12.5ms per frame without need of environmental specific training.

ROSep 20, 2021
CorAl -- Are the point clouds Correctly Aligned?

Daniel Adolfsson, Martin Magnusson, Qianfang Liao et al.

In robotics perception, numerous tasks rely on point cloud registration. However, currently there is no method that can automatically detect misaligned point clouds reliably and without environment-specific parameters. We propose "CorAl", an alignment quality measure and alignment classifier for point cloud pairs, which facilitates the ability to introspectively assess the performance of registration. CorAl compares the joint and the separate entropy of the two point clouds. The separate entropy provides a measure of the entropy that can be expected to be inherent to the environment. The joint entropy should therefore not be substantially higher if the point clouds are properly aligned. Computing the expected entropy makes the method sensitive also to small alignment errors, which are particularly hard to detect, and applicable in a range of different environments. We found that CorAl is able to detect small alignment errors in previously unseen environments with an accuracy of 95% and achieve a substantial improvement to previous methods.

ROSep 20, 2021
BFAR-Bounded False Alarm Rate detector for improved radar odometry estimation

Anas Alhashimi, Daniel Adolfsson, Martin Magnusson et al.

This paper presents a new detector for filtering noise from true detections in radar data, which improves the state of the art in radar odometry. Scanning Frequency-Modulated Continuous Wave (FMCW) radars can be useful for localization and mapping in low visibility, but return a lot of noise compared to (more commonly used) lidar, which makes the detection task more challenging. Our Bounded False-Alarm Rate (BFAR) detector is different from the classical Constant False-Alarm Rate (CFAR) detector in that it applies an affine transformation on the estimated noise level after which the parameters that minimize the estimation error can be learned. BFAR is an optimized combination between CFAR and fixed-level thresholding. Only a single parameter needs to be learned from a training dataset. We apply BFAR to the use case of radar odometry, and adapt a state-of-the-art odometry pipeline (CFEAR), replacing its original conservative filtering with BFAR. In this way we reduce the state-of-the-art translation/rotation odometry errors from 1.76%/0.5deg/100 m to 1.55%/0.46deg/100 m; an improvement of 12.5%.

ROMay 4, 2021
CFEAR Radarodometry -- Conservative Filtering for Efficient and Accurate Radar Odometry

Daniel Adolfsson, Martin Magnusson, Anas Alhashimi et al.

This paper presents the accurate, highly efficient, and learning-free method CFEAR Radarodometry for large-scale radar odometry estimation. By using a filtering technique that keeps the k strongest returns per azimuth and by additionally filtering the radar data in Cartesian space, we are able to compute a sparse set of oriented surface points for efficient and accurate scan matching. Registration is carried out by minimizing a point-to-line metric and robustness to outliers is achieved using a Huber loss. We were able to additionally reduce drift by jointly registering the latest scan to a history of keyframes and found that our odometry method generalizes to different sensor models and datasets without changing a single parameter. We evaluate our method in three widely different environments and demonstrate an improvement over spatially cross-validated state-of-the-art with an overall translation error of 1.76% in a public urban radar odometry benchmark, running at 55Hz merely on a single laptop CPU thread.

ROFeb 17, 2021
Learning Occupancy Priors of Human Motion from Semantic Maps of Urban Environments

Andrey Rudenko, Luigi Palmieri, Johannes Doellinger et al.

Understanding and anticipating human activity is an important capability for intelligent systems in mobile robotics, autonomous driving, and video surveillance. While learning from demonstrations with on-site collected trajectory data is a powerful approach to discover recurrent motion patterns, generalization to new environments, where sufficient motion data are not readily available, remains a challenge. In many cases, however, semantic information about the environment is a highly informative cue for the prediction of pedestrian motion or the estimation of collision risks. In this work, we infer occupancy priors of human motion using only semantic environment information as input. To this end we apply and discuss a traditional Inverse Optimal Control approach, and propose a novel one based on Convolutional Neural Networks (CNN) to predict future occupancy maps. Our CNN method produces flexible context-aware occupancy estimations for semantically uniform map regions and generalizes well already with small amounts of training data. Evaluated on synthetic and real-world data, it shows superior results compared to several baselines, marking a qualitative step-up in semantic environment assessment.

ROApr 19, 2020
Robust Frequency-Based Structure Extraction

Tomasz Piotr Kucner, Matteo Luperto, Stephanie Lowry et al.

State of the art mapping algorithms can produce high-quality maps. However, they are still vulnerable to clutter and outliers which can affect map quality and in consequence hinder the performance of a robot, and further map processing for semantic understanding of the environment. This paper presents ROSE, a method for building-level structure detection in robotic maps. ROSE exploits the fact that indoor environments usually contain walls and straight-line elements along a limited set of orientations. Therefore metric maps often have a set of dominant directions. ROSE extracts these directions and uses this information to segment the map into structure and clutter through filtering the map in the frequency domain (an approach substantially underutilised in the mapping applications). Removing the clutter in this way makes wall detection (e.g. using the Hough transform) more robust. Our experiments demonstrate that (1) the application of ROSE for decluttering can substantially improve structural feature retrieval (e.g., walls) in cluttered environments, (2) ROSE can successfully distinguish between clutter and structure in the map even with substantial amount of noise and (3) ROSE can numerically assess the amount of structure in the map.

ROSep 10, 2019
THÖR: Human-Robot Navigation Data Collection and Accurate Motion Trajectories Dataset

Andrey Rudenko, Tomasz P. Kucner, Chittaranjan S. Swaminathan et al.

Understanding human behavior is key for robots and intelligent systems that share a space with people. Accordingly, research that enables such systems to perceive, track, learn and predict human behavior as well as to plan and interact with humans has received increasing attention over the last years. The availability of large human motion datasets that contain relevant levels of difficulty is fundamental to this research. Existing datasets are often limited in terms of information content, annotation quality or variability of human behavior. In this paper, we present THÖR, a new dataset with human motion trajectory and eye gaze data collected in an indoor environment with accurate ground truth for position, head orientation, gaze direction, social grouping, obstacles map and goal coordinates. THÖR also contains sensor data collected by a 3D lidar and involves a mobile robot navigating the space. We propose a set of metrics to quantitatively analyze motion trajectory datasets such as the average tracking duration, ground truth noise, curvature and speed variation of the trajectories. In comparison to prior art, our dataset has a larger variety in human motion behavior, is less noisy, and contains annotations at higher frequencies.

ROAug 22, 2019
Object-RPE: Dense 3D Reconstruction and Pose Estimation with Convolutional Neural Networks for Warehouse Robots

Dinh-Cuong Hoang, Todor Stoyanov, Achim J. Lilienthal

We present an approach for recognizing all objects in a scene and estimating their full pose from an accurate 3D instance-aware semantic reconstruction using an RGB-D camera. Our framework couples convolutional neural networks (CNNs) and a state-of-the-art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, to achieve both high-quality semantic reconstruction as well as robust 6D pose estimation for relevant objects. While the main trend in CNN-based 6D pose estimation has been to infer object's position and orientation from single views of the scene, our approach explores performing pose estimation from multiple viewpoints, under the conjecture that combining multiple predictions can improve the robustness of an object detection system. The resulting system is capable of producing high-quality object-aware semantic reconstructions of room-sized environments, as well as accurately detecting objects and their 6D poses. The developed method has been verified through experimental validation on the YCB-Video dataset and a newly collected warehouse object dataset. Experimental results confirmed that the proposed system achieves improvements over state-of-the-art methods in terms of surface reconstruction and object pose prediction. Our code and video are available at https://sites.google.com/view/object-rpe.

HCApr 26, 2019
Current Trends in the Use of Eye Tracking in Mathematics Education Research: A PME Survey

Achim J. Lilienthal, Maike Schindler

Eye tracking (ET) is a research method that receives growing interest in mathematics education research (MER). This paper aims to give a literature overview, specifically focusing on the evolution of interest in this technology, ET equipment, and analysis methods used in mathematics education. To capture the current state, we focus on papers published in the proceedings of PME, one of the primary conferences dedicated to MER, of the last ten years. We identify trends in interest, methodology, and methods of analysis that are used in the community, and discuss possible future developments.

ROMar 26, 2019
High-quality Instance-aware Semantic 3D Map Using RGB-D Camera

Dinh-Cuong Hoang, Todor Stoyanov, Achim J. Lilienthal

We present a mapping system capable of constructing detailed instance-level semantic models of room-sized indoor environments by means of an RGB-D camera. In this work, we integrate deep-learning-based instance segmentation and classification into a state of the art RGB-D SLAM system. We leverage the pipeline of ElasticFusion [1] as a backbone and propose modifications of the registration cost function. The proposed objective function features a tunable weight for the appearance channel, which can be learned from data. The resulting system is capable of producing accurate semantic maps of room-sized environments, as well as reconstructing highly detailed object-level models. The developed method has been verified through experimental validation on the TUMRGB-D SLAM benchmark and the YCB video dataset. Our results confirmed that the proposed system performs favorably in terms of trajectory estimation, surface reconstruction, and segmentation quality in comparison to other state-of-the-art systems.

ROJan 21, 2018
A Next-Best-Smell Approach for Remote Gas Detection with a Mobile Robot

Riccardo Polvara, Marco Trabattoni, Tomasz Piotr Kucner et al.

The problem of gas detection is relevant to many real-world applications, such as leak detection in industrial settings and landfill monitoring. Using mobile robots for gas detection has several advantages and can reduce danger for humans. In our work, we address the problem of planning a path for a mobile robotic platform equipped with a remote gas sensor, which minimizes the time to detect all gas sources in a given environment. We cast this problem as a coverage planning problem by defining a basic sensing operation -- a scan with the remote gas sensor -- as the field of "view" of the sensor. Given the computing effort required by previously proposed offline approaches, in this paper we suggest a online coverage algorithm, called Next-Best-Smell, adapted from the Next-Best-View class of exploration algorithms. Our algorithm evaluates candidate locations with a global utility function, which combines utility values for travel distance, information gain, and sensing time, using Multi-Criteria Decision Making. In our experiments, conducted both in simulation and with a real robot, we found the performance of the Next-Best-Smell approach to be comparable with that of the state-of-the-art offline algorithm, at much lower computational cost.

ROOct 4, 2017
Exploring home robot capabilities by medium fidelity prototyping

Martin Cooney, Sepideh Pashami, Yuantao Fan et al.

In order for autonomous robots to be able to support people's well-being in homes and everyday environments, new interactive capabilities will be required, as exemplified by the soft design used for Disney's recent robot character Baymax in popular fiction. Home robots will be required to be easy to interact with and intelligent--adaptive, fun, unobtrusive and involving little effort to power and maintain--and capable of carrying out useful tasks both on an everyday level and during emergencies. The current article adopts an exploratory medium fidelity prototyping approach for testing some new robotic capabilities in regard to recognizing people's activities and intentions and behaving in a way which is transparent to people. Results are discussed with the aim of informing next designs.

ROSep 28, 2017
A method to segment maps from different modalities using free space layout -- MAORIS : MAp Of RIpples Segmentation

Malcolm Mielle, Martin Magnusson, Achim J. Lilienthal

How to divide floor plans or navigation maps into semantic representations, such as rooms and corridors, is an important research question in fields such as human-robot interaction, place categorization, or semantic mapping. While most works focus on segmenting robot built maps, those are not the only types of map a robot, or its user, can use. We present a method for segmenting maps from different modalities, focusing on robot built maps and hand-drawn sketch maps, and show better results than state of the art for both types. Our method segments the map by doing a convolution between the distance image of the map and a circular kernel, and grouping pixels of the same value. Segmentation is done by detecting ripple-like patterns where pixel values varies quickly, and merging neighboring regions with similar values. We identify a flaw in the segmentation evaluation metric used in recent works and propose a metric based on Matthews correlation coefficient (MCC). We compare our results to ground-truth segmentations of maps from a publicly available dataset, on which we obtain a better MCC than the state of the art with 0.98 compared to 0.65 for a recent Voronoi-based segmentation method and 0.70 for the DuDe segmentation method. We also provide a dataset of sketches of an indoor environment, with two possible sets of ground truth segmentations, on which our method obtains an MCC of 0.56 against 0.28 for the Voronoi-based segmentation method and 0.30 for DuDe.

ROFeb 16, 2017
SLAM auto-complete: completing a robot map using an emergency map

Malcolm Mielle, Martin Magnusson, Henrik Andreasson et al.

In search and rescue missions, time is an important factor; fast navigation and quickly acquiring situation awareness might be matters of life and death. Hence, the use of robots in such scenarios has been restricted by the time needed to explore and build a map. One way to speed up exploration and mapping is to reason about unknown parts of the environment using prior information. While previous research on using external priors for robot mapping mainly focused on accurate maps or aerial images, such data are not always possible to get, especially indoor. We focus on emergency maps as priors for robot mapping since they are easy to get and already extensively used by firemen in rescue missions. However, those maps can be outdated, information might be missing, and the scales of rooms are typically not consistent. We have developed a formulation of graph-based SLAM that incorporates information from an emergency map. The graph-SLAM is optimized using a combination of robust kernels, fusing the emergency map and the robot map into one map, even when faced with scale inaccuracies and inexact start poses. We typically have more than 50% of wrong correspondences in the settings studied in this paper, and the method we propose correctly handles them. Experiments in an office environment show that we can handle up to 70% of wrong correspondences and still get the expected result. The robot can navigate and explore while taking into account places it has not yet seen. We demonstrate this in a test scenario and also show that the emergency map is enhanced by adding information not represented such as closed doors or new walls.

ROSep 8, 2016
An Eigenshapes Approach to Compressed Signed Distance Fields and Their Utility in Robot Mapping

Daniel R. Canelhas, Erik Schaffernicht, Todor Stoyanov et al.

In order to deal with the scaling problem of volumetric map representations we propose spatially local methods for high-ratio compression of 3D maps, represented as truncated signed distance fields. We show that these compressed maps can be used as meaningful descriptors for selective decompression in scenarios relevant to robotic applications. As compression methods, we compare using PCA-derived low-dimensional bases to non-linear auto-encoder networks and novel mixed architectures that combine both. Selecting two application-oriented performance metrics, we evaluate the impact of different compression rates on reconstruction fidelity as well as to the task of map-aided ego-motion estimation. It is demonstrated that lossily compressed distance fields used as cost functions for ego-motion estimation, can outperform their uncompressed counterparts in challenging scenarios from standard RGB-D data-sets.