Nathan Hughes

RO
h-index64
8papers
853citations
Novelty50%
AI Score44

8 Papers

AIOct 31, 2025Code
Advancing AI Challenges for the United States Department of the Air Force

Christian Prothmann, Vijay Gadepally, Jeremy Kepner et al.

The DAF-MIT AI Accelerator is a collaboration between the United States Department of the Air Force (DAF) and the Massachusetts Institute of Technology (MIT). This program pioneers fundamental advances in artificial intelligence (AI) to expand the competitive advantage of the United States in the defense and civilian sectors. In recent years, AI Accelerator projects have developed and launched public challenge problems aimed at advancing AI research in priority areas. Hallmarks of AI Accelerator challenges include large, publicly available, and AI-ready datasets to stimulate open-source solutions and engage the wider academic and private sector AI ecosystem. This article supplements our previous publication, which introduced AI Accelerator challenges. We provide an update on how ongoing and new challenges have successfully contributed to AI research and applications of AI technologies.

ROJan 18, 2021Code
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

Antoni Rosinol, Andrew Violette, Marcus Abate et al.

Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots' internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, voxels) or as a collection of objects. This paper attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatio-temporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual-inertial data. Kimera includes state-of-the-art techniques for visual-inertial SLAM, metric-semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution shows how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera are open-source.

RODec 18, 2023
Indoor and Outdoor 3D Scene Graph Generation via Language-Enabled Spatial Ontologies

Jared Strader, Nathan Hughes, William Chen et al.

This paper proposes an approach to build 3D scene graphs in arbitrary indoor and outdoor environments. Such extension is challenging; the hierarchy of concepts that describe an outdoor environment is more complex than for indoors, and manually defining such hierarchy is time-consuming and does not scale. Furthermore, the lack of training data prevents the straightforward application of learning-based tools used in indoor settings. To address these challenges, we propose two novel extensions. First, we develop methods to build a spatial ontology defining concepts and relations relevant for indoor and outdoor robot operation. In particular, we use a Large Language Model (LLM) to build such an ontology, thus largely reducing the amount of manual effort required. Second, we leverage the spatial ontology for 3D scene graph construction using Logic Tensor Networks (LTN) to add logical rules, or axioms (e.g., "a beach contains sand"), which provide additional supervisory signals at training time thus reducing the need for labelled data, providing better predictions, and even allowing predicting concepts unseen at training time. We test our approach in a variety of datasets, including indoor, rural, and coastal environments, and show that it leads to a significant increase in the quality of the 3D scene graph generation with sparsely annotated data.

ROJun 9, 2025
Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs

Jared Strader, Aaron Ray, Jacob Arkin et al.

In this paper, we introduce a multi-robot system that integrates mapping, localization, and task and motion planning (TAMP) enabled by 3D scene graphs to execute complex instructions expressed in natural language. Our system builds a shared 3D scene graph incorporating an open-set object-based map, which is leveraged for multi-robot 3D scene graph fusion. This representation supports real-time, view-invariant relocalization (via the object-based map) and planning (via the 3D scene graph), allowing a team of robots to reason about their surroundings and execute complex tasks. Additionally, we introduce a planning approach that translates operator intent into Planning Domain Definition Language (PDDL) goals using a Large Language Model (LLM) by leveraging context from the shared 3D scene graph and robot capabilities. We provide an experimental assessment of the performance of our system on real-world tasks in large-scale, outdoor environments. A supplementary video is available at https://youtu.be/8xbGGOLfLAY.

ROJan 31, 2022
Hydra: A Real-time Spatial Perception System for 3D Scene Graph Construction and Optimization

Nathan Hughes, Yun Chang, Luca Carlone

3D scene graphs have recently emerged as a powerful high-level representation of 3D environments. A 3D scene graph describes the environment as a layered graph where nodes represent spatial concepts at multiple levels of abstraction and edges represent relations between concepts. While 3D scene graphs can serve as an advanced "mental model" for robots, how to build such a rich representation in real-time is still uncharted territory. This paper describes a real-time Spatial Perception System, a suite of algorithms to build a 3D scene graph from sensor data in real-time. Our first contribution is to develop real-time algorithms to incrementally construct the layers of a scene graph as the robot explores the environment; these algorithms build a local Euclidean Signed Distance Function (ESDF) around the current robot location, extract a topological map of places from the ESDF, and then segment the places into rooms using an approach inspired by community-detection techniques. Our second contribution is to investigate loop closure detection and optimization in 3D scene graphs. We show that 3D scene graphs allow defining hierarchical descriptors for loop closure detection; our descriptors capture statistics across layers in the scene graph, ranging from low-level visual appearance to summary statistics about objects and places. We then propose the first algorithm to optimize a 3D scene graph in response to loop closures; our approach relies on embedded deformation graphs to simultaneously correct all layers of the scene graph. We implement the proposed Spatial Perception System into a architecture named Hydra, that combines fast early and mid-level perception processes with slower high-level perception. We evaluate Hydra on simulated and real data and show it is able to reconstruct 3D scene graphs with an accuracy comparable with batch offline methods despite running online.

ROAug 2, 2021
Hierarchical Representations and Explicit Memory: Learning Effective Navigation Policies on 3D Scene Graphs using Graph Neural Networks

Zachary Ravichandran, Lisa Peng, Nathan Hughes et al.

Representations are crucial for a robot to learn effective navigation policies. Recent work has shown that mid-level perceptual abstractions, such as depth estimates or 2D semantic segmentation, lead to more effective policies when provided as observations in place of raw sensor data (e.g., RGB images). However, such policies must still learn latent three-dimensional scene properties from mid-level abstractions. In contrast, high-level, hierarchical representations such as 3D scene graphs explicitly provide a scene's geometry, topology, and semantics, making them compelling representations for navigation. In this work, we present a reinforcement learning framework that leverages high-level hierarchical representations to learn navigation policies. Towards this goal, we propose a graph neural network architecture and show how to embed a 3D scene graph into an agent-centric feature space, which enables the robot to learn policies for low-level action in an end-to-end manner. For each node in the scene graph, our method uses features that capture occupancy and semantic content, while explicitly retaining memory of the robot trajectory. We demonstrate the effectiveness of our method against commonly used visuomotor policies in a challenging object search task. These experiments and supporting ablation studies show that our method leads to more effective object search behaviors, exhibits improved long-term memory, and successfully leverages hierarchical information to guide its navigation objectives.

ROMar 11, 2021
Dynamic Grasping with a "Soft" Drone: From Theory to Practice

Joshua Fishman, Samuel Ubellacker, Nathan Hughes et al.

Rigid grippers used in existing aerial manipulators require precise positioning to achieve successful grasps and transmit large contact forces that may destabilize the drone. This limits the speed during grasping and prevents "dynamic grasping", where the drone attempts to grasp an object while moving. On the other hand, biological systems (e.g., birds) rely on compliant and soft parts to dampen contact forces and compensate for grasping inaccuracy, enabling impressive feats. This paper presents the first prototype of a soft drone -- a quadrotor where traditional (i.e., rigid) landing gears are replaced with a soft tendon-actuated gripper to enable aggressive grasping. We provide three key contributions. First, we describe our soft drone prototype, including electro-mechanical design, software infrastructure, and fabrication. Second, we review the set of algorithms we use for trajectory optimization and control of the drone and the soft gripper; the algorithms combine state-of-the-art techniques for quadrotor control (i.e., an adaptive geometric controller) with advanced soft robotics models (i.e., a quasi-static finite element model). Finally, we evaluate our soft drone in physics simulations (using SOFA and Unity) and in real tests in a motion-capture room. Our drone is able to dynamically grasp objects of unknown shape where baseline approaches fail. Our physical prototype ensures consistent performance, achieving 91.7% successful grasps across 23 trials. We showcase dynamic grasping results in the video attachment. Video Attachment: https://youtu.be/mqbj8mEyCdk

ROJul 12, 2019
Active Rendezvous for Multi-Robot Pose Graph Optimization using Sensing over Wi-Fi

Weiying Wang, Ninad Jadhav, Paul Vohs et al.

We present a novel framework for collaboration amongst a team of robots performing Pose Graph Optimization (PGO) that addresses two important challenges for multi-robot SLAM: i) that of enabling information exchange "on-demand" via Active Rendezvous without using a map or the robot's location, and ii) that of rejecting outlying measurements. Our key insight is to exploit relative position data present in the communication channel between robots to improve groundtruth accuracy of PGO. We develop an algorithmic and experimental framework for integrating Channel State Information (CSI) with multi-robot PGO; it is distributed, and applicable in low-lighting or featureless environments where traditional sensors often fail. We present extensive experimental results on actual robots and observe that using Active Rendezvous results in a 64% reduction in ground truth pose error and that using CSI observations to aid outlier rejection reduces ground truth pose error by 32%. These results show the potential of integrating communication as a novel sensor for SLAM.