Roberto Furfaro

SY
h-index12
18papers
727citations
Novelty42%
AI Score37

18 Papers

SYOct 20, 2018
Deep Reinforcement Learning for Six Degree-of-Freedom Planetary Powered Descent and Landing

Brian Gaudet, Richard Linares, Roberto Furfaro

Future Mars missions will require advanced guidance, navigation, and control algorithms for the powered descent phase to target specific surface locations and achieve pinpoint accuracy (landing error ellipse $<$ 5 m radius). The latter requires both a navigation system capable of estimating the lander's state in real-time and a guidance and control system that can map the estimated lander state to a commanded thrust for each lander engine. In this paper, we present a novel integrated guidance and control algorithm designed by applying the principles of reinforcement learning theory. The latter is used to learn a policy mapping the lander's estimated state directly to a commanded thrust for each engine, with the policy resulting in accurate and fuel-efficient trajectories. Specifically, we use proximal policy optimization, a policy gradient method, to learn the policy. Another contribution of this paper is the use of different discount rates for terminal and shaping rewards, which significantly enhances optimization performance. We present simulation results demonstrating the guidance and control system's performance in a 6-DOF simulation environment and demonstrate robustness to noise and system parameter uncertainty.

AIOct 27, 2023
Deep Reinforcement Learning for Weapons to Targets Assignment in a Hypersonic strike

Brian Gaudet, Kris Drozd, Roberto Furfaro

We use deep reinforcement learning (RL) to optimize a weapons to target assignment (WTA) policy for multi-vehicle hypersonic strike against multiple targets. The objective is to maximize the total value of destroyed targets in each episode. Each randomly generated episode varies the number and initial conditions of the hypersonic strike weapons (HSW) and targets, the value distribution of the targets, and the probability of a HSW being intercepted. We compare the performance of this WTA policy to that of a benchmark WTA policy derived using non-linear integer programming (NLIP), and find that the RL WTA policy gives near optimal performance with a 1000X speedup in computation time, allowing real time operation that facilitates autonomous decision making in the mission end game.

LGNov 15, 2021Code
VisualEnv: visual Gym environments with Blender

Andrea Scorsoglio, Roberto Furfaro

In this paper VisualEnv, a new tool for creating visual environment for reinforcement learning is introduced. It is the product of an integration of an open-source modelling and rendering software, Blender, and a python module used to generate environment model for simulation, OpenAI Gym. VisualEnv allows the user to create custom environments with photorealistic rendering capabilities and full integration with python. The framework is described and tested on a series of example problems that showcase its features for training reinforcement learning agents.

ROSep 4, 2025
Action Chunking with Transformers for Image-Based Spacecraft Guidance and Control

Alejandro Posadas-Nava, Andrea Scorsoglio, Luca Ghilardi et al.

We present an imitation learning approach for spacecraft guidance, navigation, and control(GNC) that achieves high performance from limited data. Using only 100 expert demonstrations, equivalent to 6,300 environment interactions, our method, which implements Action Chunking with Transformers (ACT), learns a control policy that maps visual and state observations to thrust and torque commands. ACT generates smoother, more consistent trajectories than a meta-reinforcement learning (meta-RL) baseline trained with 40 million interactions. We evaluate ACT on a rendezvous task: in-orbit docking with the International Space Station (ISS). We show that our approach achieves greater accuracy, smoother control, and greater sample efficiency.

CLJan 15, 2022
Extracting Space Situational Awareness Events from News Text

Zhengnan Xie, Alice Saebom Kwak, Enfa George et al.

Space situational awareness typically makes use of physical measurements from radar, telescopes, and other assets to monitor satellites and other spacecraft for operational, navigational, and defense purposes. In this work we explore using textual input for the space situational awareness task. We construct a corpus of 48.5k news articles spanning all known active satellites between 2009 and 2020. Using a dependency-rule-based extraction system designed to target three high-impact events -- spacecraft launches, failures, and decommissionings, we identify 1,787 space-event sentences that are then annotated by humans with 15.9k labels for event slots. We empirically demonstrate a state-of-the-art neural extraction system achieves an overall F1 between 53 and 91 per slot for event extraction in this low-resource, high-impact domain.

SYDec 16, 2021
Integrated Guidance and Control for Lunar Landing using a Stabilized Seeker

Brian Gaudet, Roberto Furfaro

We develop an integrated guidance and control system that in conjunction with a stabilized seeker and landing site detection software can achieve precise and safe planetary landing. The seeker tracks the designated landing site by adjusting seeker elevation and azimuth angles to center the designated landing site in the sensor field of view. The seeker angles, closing speed, and range to the designated landing site are used to formulate a velocity field that is used by the guidance and control system to achieve a safe landing at the designated landing site. The guidance and control system maps this velocity field, attitude, and rotational velocity directly to a commanded thrust vector for the lander's four engines. The guidance and control system is implemented as a policy optimized using reinforcement meta learning. We demonstrate that the guidance and control system is compatible with multiple diverts during the powered descent phase, and is robust to seeker lag, actuator lag and degradation, and center of mass variation induced by fuel consumption. We outline several concepts of operations, including an approach using a preplaced landing beacon.

SYOct 1, 2021
Terminal Adaptive Guidance for Autonomous Hypersonic Strike Weapons via Reinforcement Learning

Brian Gaudet, Roberto Furfaro

An adaptive guidance system suitable for the terminal phase trajectory of a hypersonic strike weapon is optimized using reinforcement meta learning. The guidance system maps observations directly to commanded bank angle, angle of attack, and sideslip angle rates. Importantly, the observations are directly measurable from radar seeker outputs with minimal processing. The optimization framework implements a shaping reward that minimizes the line of sight rotation rate, with a terminal reward given if the agent satisfies path constraints and meets terminal accuracy and speed criteria. We show that the guidance system can adapt to off-nominal flight conditions including perturbation of aerodynamic coefficient parameters, actuator failure scenarios, sensor scale factor errors, and actuator lag, while satisfying heating rate, dynamic pressure, and load path constraints, as well as a minimum impact speed constraint. We demonstrate precision strike capability against a maneuvering ground target and the ability to divert to a new target, the latter being important to maximize strike effectiveness for a group of hypersonic strike weapons. Moreover, we demonstrate a threat evasion strategy against interceptors with limited midcourse correction capability, where the hypersonic strike weapon implements multiple diverts to alternate targets, with the last divert to the actual target. Finally, we include preliminary results for an integrated guidance and control system in a six degrees-of-freedom environment.

SYSep 8, 2021
Integrated and Adaptive Guidance and Control for Endoatmospheric Missiles via Reinforcement Learning

Brian Gaudet, Roberto Furfaro

We apply a reinforcement meta-learning framework to optimize an integrated and adaptive guidance and flight control system for an air-to-air missile. The system is implemented as a policy that maps navigation system outputs directly to commanded rates of change for the missile's control surface deflections. The system induces intercept trajectories against a maneuvering target that satisfy control constraints on fin deflection angles, and path constraints on look angle and load. We test the optimized system in a six degrees-of-freedom simulator that includes a non-linear radome model and a strapdown seeker model, and demonstrate that the system adapts to both a large flight envelope and off-nominal flight conditions including perturbation of aerodynamic coefficient parameters and center of pressure locations, and flexible body dynamics. Moreover, we find that the system is robust to the parasitic attitude loop induced by radome refraction and imperfect seeker stabilization. We compare our system's performance to a longitudinal model of proportional navigation coupled with a three loop autopilot, and find that our system outperforms this benchmark by a large margin. Additional experiments investigate the impact of removing the recurrent layer from the policy and value function networks, performance with an infrared seeker, and flexible body dynamics.

ROJul 30, 2021
Adaptive Approach Phase Guidance for a Hypersonic Glider via Reinforcement Meta Learning

Brian Gaudet, Kris Drozd, Ryan Meltzer et al.

We use Reinforcement Meta Learning to optimize an adaptive guidance system suitable for the approach phase of a gliding hypersonic vehicle. Adaptability is achieved by optimizing over a range of off-nominal flight conditions including perturbation of aerodynamic coefficient parameters, actuator failure scenarios, and sensor noise. The system maps observations directly to commanded bank angle and angle of attack rates. These observations include a velocity field tracking error formulated using parallel navigation, but adapted to work over long trajectories where the Earth's curvature must be taken into account. Minimizing the tracking error keeps the curved space line of sight to the target location aligned with the vehicle's velocity vector. The optimized guidance system will then induce trajectories that bring the vehicle to the target location with a high degree of accuracy at the designated terminal speed, while satisfying heating rate, load, and dynamic pressure constraints. We demonstrate the adaptability of the guidance system by testing over flight conditions that were not experienced during optimization. The guidance system's performance is then compared to that of a linear quadratic regulator tracking an optimal trajectory.

LGMay 15, 2020
Extreme Theory of Functional Connections: A Physics-Informed Neural Network Method for Solving Parametric Differential Equations

Enrico Schiassi, Carl Leake, Mario De Florio et al.

In this work we present a novel, accurate, and robust physics-informed method for solving problems involving parametric differential equations (DEs) called the Extreme Theory of Functional Connections, or X-TFC. The proposed method is a synergy of two recently developed frameworks for solving problems involving parametric DEs, 1) the Theory of Functional Connections, TFC, and the Physics-Informed Neural Networks, PINN. Although this paper focuses on the solution of exact problems involving parametric DEs (i.e. problems where the modeling error is negligible) with known parameters, X-TFC can also be used for data-driven solutions and data-driven discovery of parametric DEs. In the proposed method, the latent solution of the parametric DEs is approximated by a TFC constrained expression that uses a Neural Network (NN) as the free-function. This approximate solution form always analytically satisfies the constraints of the DE, while maintaining a NN with unconstrained parameters, like the Deep-TFC method. X-TFC differs from PINN and Deep-TFC; whereas PINN and Deep-TFC use a deep-NN, X-TFC uses a single-layer NN, or more precisely, an Extreme Learning Machine, ELM. This choice is based on the properties of the ELM algorithm. In order to numerically validate the method, it was tested over a range of problems including the approximation of solutions to linear and non-linear ordinary DEs (ODEs), systems of ODEs (SODEs), and partial DEs (PDEs). Furthermore, a few of these problems are of interest in physics and engineering such as the Classic Emden-Fowler equation, the Radiative Transfer (RT) equation, and the Heat-Transfer (HT) equation. The results show that X-TFC achieves high accuracy with low computational time and thus it is comparable with the other state-of-the-art methods.

SYApr 18, 2020
Reinforcement Meta-Learning for Interception of Maneuvering Exoatmospheric Targets with Parasitic Attitude Loop

Brian Gaudet, Roberto Furfaro, Richard Linares et al.

We use Reinforcement Meta-Learning to optimize an adaptive integrated guidance, navigation, and control system suitable for exoatmospheric interception of a maneuvering target. The system maps observations consisting of strapdown seeker angles and rate gyro measurements directly to thruster on-off commands. Using a high fidelity six degree-of-freedom simulator, we demonstrate that the optimized policy can adapt to parasitic effects including seeker angle measurement lag, thruster control lag, the parasitic attitude loop resulting from scale factor errors and Gaussian noise on angle and rotational velocity measurements, and a time varying center of mass caused by fuel consumption and slosh. Importantly, the optimized policy gives good performance over a wide range of challenging target maneuvers. Unlike previous work that enhances range observability by inducing line of sight oscillations, our system is optimized to use only measurements available from the seeker and rate gyros. Through extensive Monte Carlo simulation of randomized exoatmospheric interception scenarios, we demonstrate that the optimized policy gives performance close to that of augmented proportional navigation with perfect knowledge of the full engagement state. The optimized system is computationally efficient and requires minimal memory, and should be compatible with today's flight processors.

SYNov 16, 2019
Six Degree-of-Freedom Body-Fixed Hovering over Unmapped Asteroids via LIDAR Altimetry and Reinforcement Meta-Learning

Brian Gaudet, Richard Linares, Roberto Furfaro

We optimize a six degrees of freedom hovering policy using reinforcement meta-learning. The policy maps flash LIDAR measurements directly to on/off spacecraft body-frame thrust commands, allowing hovering at a fixed position and attitude in the asteroid body-fixed reference frame. Importantly, the policy does not require position and velocity estimates, and can operate in environments with unknown dynamics, and without an asteroid shape model or navigation aids. Indeed, during optimization the agent is confronted with a new randomly generated asteroid for each episode, insuring that it does not learn an asteroid's shape, texture, or environmental dynamics. This allows the deployed policy to generalize well to novel asteroid characteristics, which we demonstrate in our experiments. Moreover, our experiments show that the optimized policy adapts to actuator failure and sensor noise. Although the policy is optimized using randomly generated synthetic asteroids, it is tested on two shape models from actual asteroids: Bennu and Itokawa. We find that the policy generalizes well to these shape models. The hovering controller has the potential to simplify mission planning by allowing asteroid body-fixed hovering immediately upon the spacecraft's arrival to an asteroid. This in turn simplifies shape model generation and allows resource mapping via remote sensing immediately upon arrival at the target asteroid.

SYJul 13, 2019
Seeker based Adaptive Guidance via Reinforcement Meta-Learning Applied to Asteroid Close Proximity Operations

Brian Gaudet, Richard Linares, Roberto Furfaro

Current practice for asteroid close proximity maneuvers requires extremely accurate characterization of the environmental dynamics and precise spacecraft positioning prior to the maneuver. This creates a delay of several months between the spacecraft's arrival and the ability to safely complete close proximity maneuvers. In this work we develop an adaptive integrated guidance, navigation, and control system that can complete these maneuvers in environments with unknown dynamics, with initial conditions spanning a large deployment region, and without a shape model of the asteroid. The system is implemented as a policy optimized using reinforcement meta-learning. The spacecraft is equipped with an optical seeker that locks to either a terrain feature, back-scattered light from a targeting laser, or an active beacon, and the policy maps observations consisting of seeker angles and LIDAR range readings directly to engine thrust commands. The policy implements a recurrent network layer that allows the deployed policy to adapt real time to both environmental forces acting on the agent and internal disturbances such as actuator failure and center of mass variation. We validate the guidance system through simulated landing maneuvers in a six degrees-of-freedom simulator. The simulator randomizes the asteroid's characteristics such as solar radiation pressure, density, spin rate, and nutation angle, requiring the guidance and control system to adapt to the environment. We also demonstrate robustness to actuator failure, sensor bias, and changes in the spacecraft's center of mass and inertia tensor. Finally, we suggest a concept of operations for asteroid close proximity maneuvers that is compatible with the guidance system.

SPACE-PHJan 25, 2019
End to End Satellite Servicing and Space Debris Management

Aman Chandra, Himangshu Kalita, Roberto Furfaro et al.

There is growing demand for satellite swarms and constellations for global positioning, remote sensing and relay communication in higher LEO orbits. This will result in many obsolete, damaged and abandoned satellites that will remain on-orbit beyond 25 years. These abandoned satellites and space debris maybe economically valuable orbital real-estate and resources that can be reused, repaired or upgraded for future use. Space traffic management is critical to repair damaged satellites, divert satellites into warehouse orbits and effectively de-orbit satellites and space debris that are beyond repair and salvage. Current methods for on-orbit capture, servicing and repair require a large service satellite. However, by accessing abandoned satellites and space debris, there is an inherent heightened risk of damage to a servicing spacecraft. Sending multiple small-robots with each robot specialized in a specific task is a credible alternative, as the system is simple and cost-effective and where loss of one or more robots does not end the mission. In this work, we outline an end to end multirobot system to capture damaged and abandoned spacecraft for salvaging, repair and for de-orbiting. We analyze the feasibility of sending multiple, decentralized robots that can work cooperatively to perform capture of the target satellite as a first step, followed by crawling onto damage satellites to perform detailed mapping. After obtaining a detailed map of the satellite, the robots will proceed to either repair and replace or dismantle components for salvage operations. Finally, the remaining components will be packaged with a de-orbit device for accelerated de-orbit.

SYApr 18, 2019
Adaptive Guidance and Integrated Navigation with Reinforcement Meta-Learning

Brian Gaudet, Richard Linares, Roberto Furfaro

This paper proposes a novel adaptive guidance system developed using reinforcement meta-learning with a recurrent policy and value function approximator. The use of recurrent network layers allows the deployed policy to adapt real time to environmental forces acting on the agent. We compare the performance of the DR/DV guidance law, an RL agent with a non-recurrent policy, and an RL agent with a recurrent policy in four challenging environments with unknown but highly variable dynamics. These tasks include a safe Mars landing with random engine failure and a landing on an asteroid with unknown environmental dynamics. We also demonstrate the ability of a RL meta-learning optimized policy to implement a guidance law using observations consisting of only Doppler radar altimeter readings in a Mars landing environment, and LIDAR altimeter readings in an asteroid landing environment, thus integrating guidance and navigation.

LGJan 12, 2019
Learning Accurate Extended-Horizon Predictions of High Dimensional Trajectories

Brian Gaudet, Richard Linares, Roberto Furfaro

We present a novel predictive model architecture based on the principles of predictive coding that enables open loop prediction of future observations over extended horizons. There are two key innovations. First, whereas current methods typically learn to make long-horizon open-loop predictions using a multi-step cost function, we instead run the model open loop in the forward pass during training. Second, current predictive coding models initialize the representation layer's hidden state to a constant value at the start of an episode, and consequently typically require multiple steps of interaction with the environment before the model begins to produce accurate predictions. Instead, we learn a mapping from the first observation in an episode to the hidden state, allowing the trained model to immediately produce accurate predictions. We compare the performance of our architecture to a standard predictive coding model and demonstrate the ability of the model to make accurate long horizon open-loop predictions of simulated Doppler radar altimeter readings during a six degree of freedom Mars landing. Finally, we demonstrate a 2X reduction in sample complexity by using the model to implement a Dyna style algorithm to accelerate policy learning with proximal policy optimization.

CVSep 6, 2018
On-Orbit Smart Camera System to Observe Illuminated and Unilluminated Space Objects

Steve Morad, Ravi Teja Nallapu, Himangshu Kalita et al.

The wide availability of Commercial Off-The-Shelf (COTS) electronics that can withstand Low Earth Orbit conditions has opened avenue for wide deployment of CubeSats and small-satellites. CubeSats thanks to their low developmental and launch costs offer new opportunities for rapidly demonstrating on-orbit surveillance capabilities. In our earlier work, we proposed development of SWIMSat (Space based Wide-angle Imaging of Meteors) a 3U CubeSat demonstrator that is designed to observe illuminated objects entering the Earth's atmosphere. The spacecraft would operate autonomously using a smart camera with vision algorithms to detect, track and report of objects. Several CubeSats can track an object in a coordinated fashion to pinpoint an object's trajectory. An extension of this smart camera capability is to track unilluminated objects utilizing capabilities we have been developing to track and navigate to Near Earth Objects (NEOs). This extension enables detecting and tracking objects that can't readily be detected by humans. The system maintains a dense star map of the night sky and performs round the clock observations. Standard optical flow algorithms are used to obtain trajectories of all moving objects in the camera field of view. Through a process of elimination, certain stars maybe occluded by a transiting unilluminated object which is then used to first detect and obtain a trajectory of the object. Using multiple cameras observing the event from different points of view, it may be possible then to triangulate the position of the object in space and obtain its orbital trajectory. In this work, the performance of our space object detection algorithm coupled with a spacecraft guidance, navigation, and control system is demonstrated.

ROSep 6, 2018
Satellite Capture and Servicing Using Networks of Tethered Robots Supported by Ground Surveillance

Himangshu Kalita, Roberto Furfaro, Jekan Thangavelautham

There is ever growing demand for satellite constellations that perform global positioning, remote sensing, earth-imaging and relay communication. In these highly prized orbits, there are many obsolete and abandoned satellites and components strewn posing ever-growing logistical challenges. This increased demand for satellite constellations pose challenges for space traffic management, where there is growing need to identify the risks probabilities and if possible mitigate them. These abandoned satellites and space debris maybe economically valuable orbital real-estate and resources that can be reused, repaired or upgraded for future use. On-orbit capture and servicing of a satellite requires satellite rendezvous, docking and repair, removal and replacement of components. Launching a big spacecraft that perform satellites servicing is one credible approach for servicing and maintaining next-generation constellations. By accessing abandoned satellites and space debris, there is an inherent heightened risk of damage to a servicing spacecraft. Under these scenarios, sending multiple, small-robots with each robot specialized in a specific task is a credible alternative, as the system is simple and cost-effective and where loss of one or more of robot does not end the mission. Eliminating the need for a large spacecraft or positioning the large spacecraft at safe distance to provide position, navigation and tracking support simplifies the system and enable the approach to be extensible with the latest ground-based sensing technology. In this work, we analyze the feasibility of sending multiple, decentralized robots that can work cooperatively to perform capture of the target satellite as first steps to on-orbit satellite servicing.