Martin Magnusson

RO
h-index52
18papers
231citations
Novelty53%
AI Score51

18 Papers

ROOct 26, 2023
Learning Extrinsic Dexterity with Parameterized Manipulation Primitives

Shih-Min Yang, Martin Magnusson, Johannes A. Stork et al.

Many practically relevant robot grasping problems feature a target object for which all grasps are occluded, e.g., by the environment. Single-shot grasp planning invariably fails in such scenarios. Instead, it is necessary to first manipulate the object into a configuration that affords a grasp. We solve this problem by learning a sequence of actions that utilize the environment to change the object's pose. Concretely, we employ hierarchical reinforcement learning to combine a sequence of learned parameterized manipulation primitives. By learning the low-level manipulation policies, our approach can control the object's state through exploiting interactions between the object, the gripper, and the environment. Designing such a complex behavior analytically would be infeasible under uncontrolled conditions, as an analytic approach requires accurate physical modeling of the interaction and contact dynamics. In contrast, we learn a hierarchical policy model that operates directly on depth perception data, without the need for object detection, pose estimation, or manual design of controllers. We evaluate our approach on picking box-shaped objects of various weight, shape, and friction properties from a constrained table-top workspace. Our method transfers to a real robot and is able to successfully complete the object picking task in 98\% of experimental trials. Supplementary information and videos can be found at https://shihminyang.github.io/ED-PMP/.

36.9ROMar 26
Accurate Surface and Reflectance Modelling from 3D Radar Data with Neural Radiance Fields

Judith Treffler, Vladimír Kubelka, Henrik Andreasson et al.

Robust scene representation is essential for autonomous systems to safely operate in challenging low-visibility environments. Radar has a clear advantage over cameras and lidars in these conditions due to its resilience to environmental factors such as fog, smoke, or dust. However, radar data is inherently sparse and noisy, making reliable 3D surface reconstruction challenging. To address these challenges, we propose a neural implicit approach for 3D mapping from radar point clouds, which jointly models scene geometry and view-dependent radar intensities. Our method leverages a memory-efficient hybrid feature encoding to learn a continuous Signed Distance Field (SDF) for surface reconstruction, while also capturing radar-specific reflective properties. We show that our approach produces smoother, more accurate 3D surface reconstructions compared to existing lidar-based reconstruction methods applied to radar data, and can reconstruct view-dependent radar intensities. We also show that in general, as input point clouds get sparser, neural implicit representations render more faithful surfaces, compared to traditional explicit SDFs and meshing techniques.

79.9CVMar 12
Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos

Shuo Sun, Unal Artan, Malcolm Mielle et al.

We address the challenging problem of dense dynamic scene reconstruction and camera pose estimation from multiple freely moving cameras -- a setting that arises naturally when multiple observers capture a shared event. Prior approaches either handle only single-camera input or require rigidly mounted, pre-calibrated camera rigs, limiting their practical applicability. We propose a two-stage optimization framework that decouples the task into robust camera tracking and dense depth refinement. In the first stage, we extend single-camera visual SLAM to the multi-camera setting by constructing a spatiotemporal connection graph that exploits both intra-camera temporal continuity and inter-camera spatial overlap, enabling consistent scale and robust tracking. To ensure robustness under limited overlap, we introduce a wide-baseline initialization strategy using feed-forward reconstruction models. In the second stage, we refine depth and camera poses by optimizing dense inter- and intra-camera consistency using wide-baseline optical flow. Additionally, we introduce MultiCamRobolab, a new real-world dataset with ground-truth poses from a motion capture system. Finally, we demonstrate that our method significantly outperforms state-of-the-art feed-forward models on both synthetic and real-world benchmarks, while requiring less memory.

CVApr 8, 2024
Human Detection from 4D Radar Data in Low-Visibility Field Conditions

Mikael Skog, Oleksandr Kotlyar, Vladimír Kubelka et al.

Autonomous driving technology is increasingly being used on public roads and in industrial settings such as mines. While it is essential to detect pedestrians, vehicles, or other obstacles, adverse field conditions negatively affect the performance of classical sensors such as cameras or lidars. Radar, on the other hand, is a promising modality that is less affected by, e.g., dust, smoke, water mist or fog. In particular, modern 4D imaging radars provide target responses across the range, vertical angle, horizontal angle and Doppler velocity dimensions. We propose TMVA4D, a CNN architecture that leverages this 4D radar modality for semantic segmentation. The CNN is trained to distinguish between the background and person classes based on a series of 2D projections of the 4D radar data that include the elevation, azimuth, range, and Doppler velocity dimensions. We also outline the process of compiling a novel dataset consisting of data collected in industrial settings with a car-mounted 4D radar and describe how the ground-truth labels were generated from reference thermal images. Using TMVA4D on this dataset, we achieve an mIoU score of 78.2% and an mDice score of 86.1%, evaluated on the two classes background and person

ROOct 4, 2025
Trajectory prediction for heterogeneous agents: A performance analysis on small and imbalanced datasets

Tiago Rodrigues de Almeida, Yufei Zhu, Andrey Rudenko et al.

Robots and other intelligent systems navigating in complex dynamic environments should predict future actions and intentions of surrounding agents to reach their goals efficiently and avoid collisions. The dynamics of those agents strongly depends on their tasks, roles, or observable labels. Class-conditioned motion prediction is thus an appealing way to reduce forecast uncertainty and get more accurate predictions for heterogeneous agents. However, this is hardly explored in the prior art, especially for mobile robots and in limited data applications. In this paper, we analyse different class-conditioned trajectory prediction methods on two datasets. We propose a set of conditional pattern-based and efficient deep learning-based baselines, and evaluate their performance on robotics and outdoors datasets (THÖR-MAGNI and Stanford Drone Dataset). Our experiments show that all methods improve accuracy in most of the settings when considering class labels. More importantly, we observe that there are significant differences when learning from imbalanced datasets, or in new environments where sufficient data is not available. In particular, we find that deep learning methods perform better on balanced datasets, but in applications with limited data, e.g., cold start of a robot in a new environment, or imbalanced classes, pattern-based methods may be preferable.

31.5ROMar 13
Conflict Mitigation in Shared Environments using Flow-Aware Multi-Agent Path Finding

Lukas Heuer, Yufei Zhu, Luigi Palmieri et al.

Deploying multi-robot systems in environments shared with dynamic and uncontrollable agents presents significant challenges, especially for large robot fleets. In such environments, individual robot operations can be delayed due to unforeseen conflicts with uncontrollable agents. While existing research primarily focuses on preserving the completeness of Multi-Agent Path Finding (MAPF) solutions considering delays, there is limited emphasis on utilizing additional environmental information to enhance solution quality in the presence of other dynamic agents. To this end, we propose Flow-Aware Multi-Agent Path Finding (FA-MAPF), a novel framework that integrates learned motion patterns of uncontrollable agents into centralized MAPF algorithms. Our evaluation, conducted on a diverse set of benchmark maps with simulated uncontrollable agents and on a real-world map with recorded human trajectories, demonstrates the effectiveness of FA-MAPF compared to state-of-the-art baselines. The experimental results show that FA-MAPF can consistently reduce conflicts with uncontrollable agents, up to 55%, without compromising task efficiency.

CVApr 29, 2025
Large-scale visual SLAM for in-the-wild videos

Shuo Sun, Torsten Sattler, Malcolm Mielle et al.

Accurate and robust 3D scene reconstruction from casual, in-the-wild videos can significantly simplify robot deployment to new environments. However, reliable camera pose estimation and scene reconstruction from such unconstrained videos remains an open challenge. Existing visual-only SLAM methods perform well on benchmark datasets but struggle with real-world footage which often exhibits uncontrolled motion including rapid rotations and pure forward movements, textureless regions, and dynamic objects. We analyze the limitations of current methods and introduce a robust pipeline designed to improve 3D reconstruction from casual videos. We build upon recent deep visual odometry methods but increase robustness in several ways. Camera intrinsics are automatically recovered from the first few frames using structure-from-motion. Dynamic objects and less-constrained areas are masked with a predictive model. Additionally, we leverage monocular depth estimates to regularize bundle adjustment, mitigating errors in low-parallax situations. Finally, we integrate place recognition and loop closure to reduce long-term drift and refine both intrinsics and pose estimates through global bundle adjustment. We demonstrate large-scale contiguous 3D models from several online videos in various environments. In contrast, baseline methods typically produce locally inconsistent results at several points, producing separate segments or distorted maps. In lieu of ground-truth pose data, we evaluate map consistency, execution time and visual accuracy of re-rendered NeRF models. Our proposed system establishes a new baseline for visual reconstruction from casual uncontrolled videos found online, demonstrating more consistent reconstructions over longer sequences of in-the-wild videos than previously achieved.

LGMar 23, 2025
KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies

Shih-Min Yang, Martin Magnusson, Johannes A. Stork et al.

Soft Actor-Critic (SAC) has achieved notable success in continuous control tasks but struggles in sparse reward settings, where infrequent rewards make efficient exploration challenging. While novelty-based exploration methods address this issue by encouraging the agent to explore novel states, they are not trivial to apply to SAC. In particular, managing the interaction between novelty-based exploration and SAC's stochastic policy can lead to inefficient exploration and redundant sample collection. In this paper, we propose KEA (Keeping Exploration Alive) which tackles the inefficiencies in balancing exploration strategies when combining SAC with novelty-based exploration. KEA integrates a novelty-augmented SAC with a standard SAC agent, proactively coordinated via a switching mechanism. This coordination allows the agent to maintain stochasticity in high-novelty regions, enhancing exploration efficiency and reducing repeated sample collection. We first analyze this potential issue in a 2D navigation task, and then evaluate KEA on the DeepSea hard-exploration benchmark as well as sparse reward control tasks from the DeepMind Control Suite. Compared to state-of-the-art novelty-based exploration baselines, our experiments show that KEA significantly improves learning efficiency and robustness in sparse reward setups.

ROMar 19, 2024
High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization

Shuo Sun, Malcolm Mielle, Achim J. Lilienthal et al.

We propose a dense RGBD SLAM system based on 3D Gaussian Splatting that provides metrically accurate pose tracking and visually realistic reconstruction. To this end, we first propose a Gaussian densification strategy based on the rendering loss to map unobserved areas and refine reobserved areas. Second, we introduce extra regularization parameters to alleviate the forgetting problem in the continuous mapping problem, where parameters tend to overfit the latest frame and result in decreasing rendering quality for previous frames. Both mapping and tracking are performed with Gaussian parameters by minimizing re-rendering loss in a differentiable way. Compared to recent neural and concurrently developed gaussian splatting RGBD SLAM baselines, our method achieves state-of-the-art results on the synthetic dataset Replica and competitive results on the real-world dataset TUM.

ROSep 21, 2021
Oriented surface points for efficient and accurate radar odometry

Daniel Adolfsson, Martin Magnusson, Anas Alhashimi et al.

This paper presents an efficient and accurate radar odometry pipeline for large-scale localization. We propose a radar filter that keeps only the strongest reflections per-azimuth that exceeds the expected noise level. The filtered radar data is used to incrementally estimate odometry by registering the current scan with a nearby keyframe. By modeling local surfaces, we were able to register scans by minimizing a point-to-line metric and accurately estimate odometry from sparse point sets, hence improving efficiency. Specifically, we found that a point-to-line metric yields significant improvements compared to a point-to-point metric when matching sparse sets of surface points. Preliminary results from an urban odometry benchmark show that our odometry pipeline is accurate and efficient compared to existing methods with an overall translation error of 2.05%, down from 2.78% from the previously best published method, running at 12.5ms per frame without need of environmental specific training.

ROSep 20, 2021
CorAl -- Are the point clouds Correctly Aligned?

Daniel Adolfsson, Martin Magnusson, Qianfang Liao et al.

In robotics perception, numerous tasks rely on point cloud registration. However, currently there is no method that can automatically detect misaligned point clouds reliably and without environment-specific parameters. We propose "CorAl", an alignment quality measure and alignment classifier for point cloud pairs, which facilitates the ability to introspectively assess the performance of registration. CorAl compares the joint and the separate entropy of the two point clouds. The separate entropy provides a measure of the entropy that can be expected to be inherent to the environment. The joint entropy should therefore not be substantially higher if the point clouds are properly aligned. Computing the expected entropy makes the method sensitive also to small alignment errors, which are particularly hard to detect, and applicable in a range of different environments. We found that CorAl is able to detect small alignment errors in previously unseen environments with an accuracy of 95% and achieve a substantial improvement to previous methods.

ROSep 20, 2021
BFAR-Bounded False Alarm Rate detector for improved radar odometry estimation

Anas Alhashimi, Daniel Adolfsson, Martin Magnusson et al.

This paper presents a new detector for filtering noise from true detections in radar data, which improves the state of the art in radar odometry. Scanning Frequency-Modulated Continuous Wave (FMCW) radars can be useful for localization and mapping in low visibility, but return a lot of noise compared to (more commonly used) lidar, which makes the detection task more challenging. Our Bounded False-Alarm Rate (BFAR) detector is different from the classical Constant False-Alarm Rate (CFAR) detector in that it applies an affine transformation on the estimated noise level after which the parameters that minimize the estimation error can be learned. BFAR is an optimized combination between CFAR and fixed-level thresholding. Only a single parameter needs to be learned from a training dataset. We apply BFAR to the use case of radar odometry, and adapt a state-of-the-art odometry pipeline (CFEAR), replacing its original conservative filtering with BFAR. In this way we reduce the state-of-the-art translation/rotation odometry errors from 1.76%/0.5deg/100 m to 1.55%/0.46deg/100 m; an improvement of 12.5%.

ROMay 4, 2021
CFEAR Radarodometry -- Conservative Filtering for Efficient and Accurate Radar Odometry

Daniel Adolfsson, Martin Magnusson, Anas Alhashimi et al.

This paper presents the accurate, highly efficient, and learning-free method CFEAR Radarodometry for large-scale radar odometry estimation. By using a filtering technique that keeps the k strongest returns per azimuth and by additionally filtering the radar data in Cartesian space, we are able to compute a sparse set of oriented surface points for efficient and accurate scan matching. Registration is carried out by minimizing a point-to-line metric and robustness to outliers is achieved using a Huber loss. We were able to additionally reduce drift by jointly registering the latest scan to a history of keyframes and found that our odometry method generalizes to different sensor models and datasets without changing a single parameter. We evaluate our method in three widely different environments and demonstrate an improvement over spatially cross-validated state-of-the-art with an overall translation error of 1.76% in a public urban radar odometry benchmark, running at 55Hz merely on a single laptop CPU thread.

ROApr 19, 2020
Robust Frequency-Based Structure Extraction

Tomasz Piotr Kucner, Matteo Luperto, Stephanie Lowry et al.

State of the art mapping algorithms can produce high-quality maps. However, they are still vulnerable to clutter and outliers which can affect map quality and in consequence hinder the performance of a robot, and further map processing for semantic understanding of the environment. This paper presents ROSE, a method for building-level structure detection in robotic maps. ROSE exploits the fact that indoor environments usually contain walls and straight-line elements along a limited set of orientations. Therefore metric maps often have a set of dominant directions. ROSE extracts these directions and uses this information to segment the map into structure and clutter through filtering the map in the frequency domain (an approach substantially underutilised in the mapping applications). Removing the clutter in this way makes wall detection (e.g. using the Hough transform) more robust. Our experiments demonstrate that (1) the application of ROSE for decluttering can substantially improve structural feature retrieval (e.g., walls) in cluttered environments, (2) ROSE can successfully distinguish between clutter and structure in the map even with substantial amount of noise and (3) ROSE can numerically assess the amount of structure in the map.

ROMar 4, 2020
Localising Faster: Efficient and precise lidar-based robot localisation in large-scale environments

Li Sun, Daniel Adolfsson, Martin Magnusson et al.

This paper proposes a novel approach for global localisation of mobile robots in large-scale environments. Our method leverages learning-based localisation and filtering-based localisation, to localise the robot efficiently and precisely through seeding Monte Carlo Localisation (MCL) with a deep-learned distribution. In particular, a fast localisation system rapidly estimates the 6-DOF pose through a deep-probabilistic model (Gaussian Process Regression with a deep kernel), then a precise recursive estimator refines the estimated robot pose according to the geometric alignment. More importantly, the Gaussian method (i.e. deep probabilistic localisation) and non-Gaussian method (i.e. MCL) can be integrated naturally via importance sampling. Consequently, the two systems can be integrated seamlessly and mutually benefit from each other. To verify the proposed framework, we provide a case study in large-scale localisation with a 3D lidar sensor. Our experiments on the Michigan NCLT long-term dataset show that the proposed method is able to localise the robot in 1.94 s on average (median of 0.8 s) with precision 0.75~m in a large-scale environment of approximately 0.5 km2.

ROSep 28, 2017
A method to segment maps from different modalities using free space layout -- MAORIS : MAp Of RIpples Segmentation

Malcolm Mielle, Martin Magnusson, Achim J. Lilienthal

How to divide floor plans or navigation maps into semantic representations, such as rooms and corridors, is an important research question in fields such as human-robot interaction, place categorization, or semantic mapping. While most works focus on segmenting robot built maps, those are not the only types of map a robot, or its user, can use. We present a method for segmenting maps from different modalities, focusing on robot built maps and hand-drawn sketch maps, and show better results than state of the art for both types. Our method segments the map by doing a convolution between the distance image of the map and a circular kernel, and grouping pixels of the same value. Segmentation is done by detecting ripple-like patterns where pixel values varies quickly, and merging neighboring regions with similar values. We identify a flaw in the segmentation evaluation metric used in recent works and propose a metric based on Matthews correlation coefficient (MCC). We compare our results to ground-truth segmentations of maps from a publicly available dataset, on which we obtain a better MCC than the state of the art with 0.98 compared to 0.65 for a recent Voronoi-based segmentation method and 0.70 for the DuDe segmentation method. We also provide a dataset of sketches of an indoor environment, with two possible sets of ground truth segmentations, on which our method obtains an MCC of 0.56 against 0.28 for the Voronoi-based segmentation method and 0.30 for DuDe.

ROSep 1, 2017
2D Map Alignment With Region Decomposition

Saeed Gholami Shahbandi, Martin Magnusson

In many applications of autonomous mobile robots the following problem is encountered. Two maps of the same environment are available, one a prior map and the other a sensor map built by the robot. To benefit from all available information in both maps, the robot must find the correct alignment between the two maps. There exist many approaches to address this challenge, however, most of the previous methods rely on assumptions such as similar modalities of the maps, same scale, or existence of an initial guess for the alignment. In this work we propose a decomposition-based method for 2D spatial map alignment which does not rely on those assumptions. Our proposed method is validated and compared with other approaches, including generic data association approaches and map alignment algorithms. Real world examples of four different environments with thirty six sensor maps and four layout maps are used for this analysis. The maps, along with an implementation of the method, are made publicly available online.

ROFeb 16, 2017
SLAM auto-complete: completing a robot map using an emergency map

Malcolm Mielle, Martin Magnusson, Henrik Andreasson et al.

In search and rescue missions, time is an important factor; fast navigation and quickly acquiring situation awareness might be matters of life and death. Hence, the use of robots in such scenarios has been restricted by the time needed to explore and build a map. One way to speed up exploration and mapping is to reason about unknown parts of the environment using prior information. While previous research on using external priors for robot mapping mainly focused on accurate maps or aerial images, such data are not always possible to get, especially indoor. We focus on emergency maps as priors for robot mapping since they are easy to get and already extensively used by firemen in rescue missions. However, those maps can be outdated, information might be missing, and the scales of rooms are typically not consistent. We have developed a formulation of graph-based SLAM that incorporates information from an emergency map. The graph-SLAM is optimized using a combination of robust kernels, fusing the emergency map and the robot map into one map, even when faced with scale inaccuracies and inexact start poses. We typically have more than 50% of wrong correspondences in the settings studied in this paper, and the method we propose correctly handles them. Experiments in an office environment show that we can handle up to 70% of wrong correspondences and still get the expected result. The robot can navigate and explore while taking into account places it has not yet seen. We demonstrate this in a test scenario and also show that the emergency map is enhanced by adding information not represented such as closed doors or new walls.