Jose Andres Millan-Romera

h-index10

8papers

46citations

Novelty49%

AI Score50

Ranked #42,259 of 205,806 authors (top 21%)#1,137 in RO (top 15%)

8 Papers

50.7ROApr 23Code

Situationally-aware Path Planning Exploiting 3D Scene Graphs

Saad Ejaz, Marco Giberna, Muhammad Shaheer et al.

3D Scene Graphs integrate both metric and semantic information, yet their structure remains underutilized for improving path planning efficiency and interpretability. In this work, we present S-Path, a situationally-aware path planner that leverages the metric-semantic structure of indoor 3D Scene Graphs to significantly enhance planning efficiency. S-Path follows a two-stage process: it first performs a search over a semantic graph derived from the scene graph to yield a human-understandable high-level path. This also identifies relevant regions for planning, which later allows the decomposition of the problem into smaller, independent subproblems that can be solved in parallel. We also introduce a replanning mechanism that, in the event of an infeasible path, reuses information from previously solved subproblems to update semantic heuristics and prioritize reuse to further improve the efficiency of future planning attempts. Extensive experiments on both real-world and simulated environments show that S-Path achieves average reductions of 6x in planning time while maintaining comparable path optimality to classical sampling-based planners and surpassing them in complex scenarios, making it an efficient and interpretable path planner for environments represented by indoor 3D Scene Graphs. Code available at: https://github.com/snt-arg/spath_ros

ROMar 3, 2023

Graph-based Global Robot Localization Informing Situational Graphs with Architectural Graphs

Muhammad Shaheer, Jose Andres Millan-Romera, Hriday Bavle et al.

In this paper, we propose a solution for legged robot localization using architectural plans. Our specific contributions towards this goal are several. Firstly, we develop a method for converting the plan of a building into what we denote as an architectural graph (A-Graph). When the robot starts moving in an environment, we assume it has no knowledge about it, and it estimates an online situational graph representation (S-Graph) of its surroundings. We develop a novel graph-to-graph matching method, in order to relate the S-Graph estimated online from the robot sensors and the A-Graph extracted from the building plans. Note the challenge in this, as the S-Graph may show a partial view of the full A-Graph, their nodes are heterogeneous and their reference frames are different. After the matching, both graphs are aligned and merged, resulting in what we denote as an informed Situational Graph (iS-Graph), with which we achieve global robot localization and exploitation of prior knowledge from the building plans. Our experiments show that our pipeline shows a higher robustness and a significantly lower pose error than several LiDAR localization baselines.

37.6ROApr 28

Robust Graph Matching through Semantic Relationship Generation for SLAM

David Perez-Saura, Jose Andres Millan-Romera, Miguel Fernandez-Cortizas et al.

Graph-based representations such as Scene Graphs enable localization in structured indoor environments by matching a locally observed graph, constructed from sensor data, to a prior map. This process is particularly challenging in environments with repetitive or symmetric layouts, where structural cues alone are often insufficient to resolve ambiguities. We propose a semantic-enhanced graph matching approach that explicitly models relations between detected objects and structural elements, such as rooms and wall planes. Objects are detected from RGB-D data and integrated into the graph, and their relations to structural elements are exploited to filter candidate correspondences prior to geometric verification, significantly reducing ambiguity and search complexity. The proposed method is integrated within the iS-Graphs framework and evaluated in synthetic and simulated environments. Results show that semantic relations significantly reduce the number of candidate matches, improve computational efficiency, and enable faster convergence, particularly in symmetric scenarios where purely geometric approaches fail.

LGSep 30, 2023

Learning High-level Semantic-Relational Concepts for SLAM

Jose Andres Millan-Romera, Hriday Bavle, Muhammad Shaheer et al.

Recent works on SLAM extend their pose graphs with higher-level semantic concepts like Rooms exploiting relationships between them, to provide, not only a richer representation of the situation/environment but also to improve the accuracy of its estimation. Concretely, our previous work, Situational Graphs (S-Graphs+), a pioneer in jointly leveraging semantic relationships in the factor optimization process, relies on semantic entities such as Planes and Rooms, whose relationship is mathematically defined. Nevertheless, there is no unique approach to finding all the hidden patterns in lower-level factor-graphs that correspond to high-level concepts of different natures. It is currently tackled with ad-hoc algorithms, which limits its graph expressiveness. To overcome this limitation, in this work, we propose an algorithm based on Graph Neural Networks for learning high-level semantic-relational concepts that can be inferred from the low-level factor graph. Given a set of mapped Planes our algorithm is capable of inferring Room entities relating to the Planes. Additionally, to demonstrate the versatility of our method, our algorithm can infer an additional semantic-relational concept, i.e. Wall, and its relationship with its Planes. We validate our method in both simulated and real datasets demonstrating improved performance over two baseline approaches. Furthermore, we integrate our method into the S-Graphs+ algorithm providing improved pose and map accuracy compared to the baseline while further enhancing the scene representation.

ROSep 18, 2024

Generation of Uncertainty-Aware High-Level Spatial Concepts in Factorized 3D Scene Graphs via Graph Neural Networks

Jose Andres Millan-Romera, Muhammad Shaheer, Miguel Fernandez-Cortizas et al.

Enabling robots to autonomously discover high-level spatial concepts (e.g., rooms and walls) from primitive geometric observations (e.g., planar surfaces) within 3D Scene Graphs is essential for robust indoor navigation and mapping. These graphs provide a hierarchical metric-semantic representation in which such concepts are organized. To further enhance graph-SLAM performance, Factorized 3D Scene Graphs incorporate these concepts as optimization factors that constrain relative geometry and enforce global consistency. However, both stages of this process remain largely manual: concepts are typically derived using hand-crafted, concept-specific heuristics, while factors and their covariances are likewise manually designed. This reliance on manual specification limits generalization across diverse environments and scalability to new concept classes. This paper presents a novel learning-based method that infers spatial concepts online from observed vertical planes and introduces them as optimizable factors within a SLAM backend, eliminating the need to handcraft concept generation, factor design, and covariance specification. We evaluate our approach in simulated environments with complex layouts, improving room detection by 20.7% and trajectory estimation by 19.2%, and further validate it on real construction sites, where room detection improves by 5.3% and map matching accuracy by 3.8%. Results confirm that learned factors can improve their handcrafted counterparts in SLAM systems and serve as a foundation for extending this approach to new spatial concepts.

ROAug 13, 2025Code

Interpretable Robot Control via Structured Behavior Trees and Large Language Models

Ingrid Maéva Chekam, Ines Pastor-Martinez, Ali Tourani et al.

As intelligent robots become more integrated into human environments, there is a growing need for intuitive and reliable Human-Robot Interaction (HRI) interfaces that are adaptable and more natural to interact with. Traditional robot control methods often require users to adapt to interfaces or memorize predefined commands, limiting usability in dynamic, unstructured environments. This paper presents a novel framework that bridges natural language understanding and robotic execution by combining Large Language Models (LLMs) with Behavior Trees. This integration enables robots to interpret natural language instructions given by users and translate them into executable actions by activating domain-specific plugins. The system supports scalable and modular integration, with a primary focus on perception-based functionalities, such as person tracking and hand gesture recognition. To evaluate the system, a series of real-world experiments was conducted across diverse environments. Experimental results demonstrate that the proposed approach is practical in real-world scenarios, with an average cognition-to-execution accuracy of approximately 94%, making a significant contribution to HRI systems and robots. The complete source code of the framework is publicly available at https://github.com/snt-arg/robot_suite.

23.3ROApr 30

Learning-Based Hierarchical Scene Graph Matching for Robot Localization Leveraging Prior Maps

Nimrod Millenium Ndulue, Jose Andres Millan-Romera, Matteo Giorgi et al.

Accurate localization is a fundamental requirement for autonomous robots operating in indoor environments. Scene graphs encode the spatial structure of an environment as a hierarchy of semantic entities and their relationships, and can be constructed both online from robot sensor data and offline from architectural priors such as Building Information Models (BIM). Matching these two complementary representations enables drift correction in SLAM by grounding robot observations against a known structural prior. However, establishing reliable node-to-node correspondences between them remains an open challenge: existing combinatorial methods are prohibitively expensive at scale, and prior learned approaches address only flat graph matching, ignoring the multi-level semantic structure present in both representations. Here we present a learned, end-to-end differentiable pipeline that augments both graphs with semantically motivated edge types encoding intra- and inter- level relationships, explicitly exploiting this hierarchy to enable simultaneous matching from high-level room concepts down to low-level wall surfaces. Trained exclusively on floor plans, the proposed method outperforms the combinatorial baseline in F1 on real LiDAR environments while running an order of magnitude faster, demonstrating viable zero-shot generalization for BIM-assisted robot localization.

CVApr 11, 2025

Hardware, Algorithms, and Applications of the Neuromorphic Vision Sensor: a Review

Claudio Cimarelli, Jose Andres Millan-Romera, Holger Voos et al.

Neuromorphic, or event, cameras represent a transformation in the classical approach to visual sensing encodes detected instantaneous per-pixel illumination changes into an asynchronous stream of event packets. Their novelty compared to standard cameras lies in the transition from capturing full picture frames at fixed time intervals to a sparse data format which, with its distinctive qualities, offers potential improvements in various applications. However, these advantages come at the cost of reinventing algorithmic procedures or adapting them to effectively process the new data format. In this survey, we systematically examine neuromorphic vision along three main dimensions. First, we highlight the technological evolution and distinctive hardware features of neuromorphic cameras from their inception to recent models. Second, we review image processing algorithms developed explicitly for event-based data, covering key works on feature detection, tracking, and optical flow -which form the basis for analyzing image elements and transformations -as well as depth and pose estimation or object recognition, which interpret more complex scene structures and components. These techniques, drawn from classical computer vision and modern data-driven approaches, are examined to illustrate the breadth of applications for event-based cameras. Third, we present practical application case studies demonstrating how event cameras have been successfully used across various industries and scenarios. Finally, we analyze the challenges limiting widespread adoption, identify significant research gaps compared to standard imaging techniques, and outline promising future directions and opportunities that neuromorphic vision offers.