Richard Linares

RO
h-index64
31papers
152citations
Novelty46%
AI Score52

31 Papers

AIOct 31, 2025Code
Advancing AI Challenges for the United States Department of the Air Force

Christian Prothmann, Vijay Gadepally, Jeremy Kepner et al.

The DAF-MIT AI Accelerator is a collaboration between the United States Department of the Air Force (DAF) and the Massachusetts Institute of Technology (MIT). This program pioneers fundamental advances in artificial intelligence (AI) to expand the competitive advantage of the United States in the defense and civilian sectors. In recent years, AI Accelerator projects have developed and launched public challenge problems aimed at advancing AI research in priority areas. Hallmarks of AI Accelerator challenges include large, publicly available, and AI-ready datasets to stimulate open-source solutions and engage the wider academic and private sector AI ecosystem. This article supplements our previous publication, which introduced AI Accelerator challenges. We provide an update on how ongoing and new challenges have successfully contributed to AI research and applications of AI technologies.

AIAug 16, 2024Code
Fine-tuning LLMs for Autonomous Spacecraft Control: A Case Study Using Kerbal Space Program

Alejandro Carrasco, Victor Rodriguez-Fernandez, Richard Linares

Recent trends are emerging in the use of Large Language Models (LLMs) as autonomous agents that take actions based on the content of the user text prompt. This study explores the use of fine-tuned Large Language Models (LLMs) for autonomous spacecraft control, using the Kerbal Space Program Differential Games suite (KSPDG) as a testing environment. Traditional Reinforcement Learning (RL) approaches face limitations in this domain due to insufficient simulation capabilities and data. By leveraging LLMs, specifically fine-tuning models like GPT-3.5 and LLaMA, we demonstrate how these models can effectively control spacecraft using language-based inputs and outputs. Our approach integrates real-time mission telemetry into textual prompts processed by the LLM, which then generate control actions via an agent. The results open a discussion about the potential of LLMs for space operations beyond their nominal use for text-related tasks. Future work aims to expand this methodology to other space control tasks and evaluate the performance of different LLM families. The code is available at this URL: \texttt{https://github.com/ARCLab-MIT/kspdg}.

SYOct 20, 2018
Deep Reinforcement Learning for Six Degree-of-Freedom Planetary Powered Descent and Landing

Brian Gaudet, Richard Linares, Roberto Furfaro

Future Mars missions will require advanced guidance, navigation, and control algorithms for the powered descent phase to target specific surface locations and achieve pinpoint accuracy (landing error ellipse $<$ 5 m radius). The latter requires both a navigation system capable of estimating the lander's state in real-time and a guidance and control system that can map the estimated lander state to a commanded thrust for each lander engine. In this paper, we present a novel integrated guidance and control algorithm designed by applying the principles of reinforcement learning theory. The latter is used to learn a policy mapping the lander's estimated state directly to a commanded thrust for each engine, with the policy resulting in accurate and fuel-efficient trajectories. Specifically, we use proximal policy optimization, a policy gradient method, to learn the policy. Another contribution of this paper is the use of different discount rates for terminal and shaping rewards, which significantly enhances optimization performance. We present simulation results demonstrating the guidance and control system's performance in a 6-DOF simulation environment and demonstrate robustness to noise and system parameter uncertainty.

SYApr 11, 2018
Control of Large Swarms via Random Finite Set Theory

Bryce Doerr, Richard Linares

Controlling large swarms of robotic agents has many challenges including, but not limited to, computational complexity due to the number of agents, uncertainty in the functionality of each agent in the swarm, and uncertainty in the swarm's configuration. This work generalizes the swarm state using Random Finite Set (RFS) theory and solves the control problem using model predictive control which naturally handles the challenges. This work uses information divergence to define the distance between swarm RFS and a desired distribution. A stochastic optimal control problem is formulated using a modified L2 distance. Simulation results show that swarm densities converge to a target destination, and the RFS control formulation can vary in the number of target destinations.

SPACE-PHAug 17, 2018
Data-driven framework for real-time thermospheric density estimation

Piyush M. Mehta, Richard Linares

In this paper, we demonstrate a new data-driven framework for real-time neutral density estimation via model-data fusion in quasi-physical ionosphere-thermosphere models. The framework has two main components: (i) the development of a quasi-physical dynamic reduced order model (ROM) that uses a linear approximation of the underlying dynamics and effect of the drivers, and (ii) dynamic calibration of the ROM through estimation of the ROM coefficients that represent the model parameters. We have previously demonstrated the development of a quasi-physical ROM using simulation output from a physical model and assimilation of non-operational density estimates derived from accelerometer measurements along a single orbit. In this paper, we demonstrate the potential of the framework for use with operational measurements. We use simulated GPS-derived orbit ephemerides with 5 minute resolution as measurements. The framework is a first of its kind, simple yet robust and accurate method with high potential for providing real-time operational updates to the state of the upper atmosphere using quasi-physical models with inherent forecasting/predictive capabilities.

SYDec 3, 2020
Random Finite Set Theory and Centralized Control of Large Collaborative Swarms

Bryce Doerr, Richard Linares, Pingping Zhu et al.

Controlling large swarms of robotic agents presents many challenges including, but not limited to, computational complexity due to a large number of agents, uncertainty in the functionality of each agent in the swarm, and uncertainty in the swarm's configuration. This work generalizes the swarm state using Random Finite Set (RFS) theory and solves a centralized control problem with a Quasi-Newton optimization through the use of Model Predictive Control (MPC) to overcome the aforementioned challenges. This work uses the RFS formulation to control the distribution of agents assuming an unknown or unspecified number of agents. Computationally efficient solutions are also obtained via the MPC version of the Iterative Linear Quadratic Regulator (ILQR), a variant of Differential Dynamic Programming (DDP). Information divergence is used to define the distance between the swarm RFS and the desired swarm configuration through the use of the modified $L_2^2$ distance. Simulation results using MPC and ILQR show that the swarm intensity converges to the desired intensity. Additionally, the RFS control formulation is shown to be very flexible in terms of the number of agents in the swarm and configuration of the desired Gaussian mixtures. Lastly, the ILQR and the Gaussian Mixture Probability Hypothesis Density filter are used in conjunction to solve a spacecraft relative motion problem with imperfect information to show the viability of centralized RFS control for this real-world scenario.

AO-PHOct 25, 2023
Transformer-based Atmospheric Density Forecasting

Julia Briden, Peng Mun Siew, Victor Rodriguez-Fernandez et al.

As the peak of the solar cycle approaches in 2025 and the ability of a single geomagnetic storm to significantly alter the orbit of Resident Space Objects (RSOs), techniques for atmospheric density forecasting are vital for space situational awareness. While linear data-driven methods, such as dynamic mode decomposition with control (DMDc), have been used previously for forecasting atmospheric density, deep learning-based forecasting has the ability to capture nonlinearities in data. By learning multiple layer weights from historical atmospheric density data, long-term dependencies in the dataset are captured in the mapping between the current atmospheric density state and control input to the atmospheric density state at the next timestep. This work improves upon previous linear propagation methods for atmospheric density forecasting, by developing a nonlinear transformer-based architecture for atmospheric density forecasting. Empirical NRLMSISE-00 and JB2008, as well as physics-based TIEGCM atmospheric density models are compared for forecasting with DMDc and with the transformer-based propagator.

OCNov 9, 2023
Improving Computational Efficiency for Powered Descent Guidance via Transformer-based Tight Constraint Prediction

Julia Briden, Trey Gurga, Breanna Johnson et al.

In this work, we present Transformer-based Powered Descent Guidance (T-PDG), a scalable algorithm for reducing the computational complexity of the direct optimization formulation of the spacecraft powered descent guidance problem. T-PDG uses data from prior runs of trajectory optimization algorithms to train a transformer neural network, which accurately predicts the relationship between problem parameters and the globally optimal solution for the powered descent guidance problem. The solution is encoded as the set of tight constraints corresponding to the constrained minimum-cost trajectory and the optimal final time of landing. By leveraging the attention mechanism of transformer neural networks, large sequences of time series data can be accurately predicted when given only the spacecraft state and landing site parameters. When applied to the real problem of Mars powered descent guidance, T-PDG reduces the time for computing the 3 degree of freedom fuel-optimal trajectory, when compared to lossless convexification, from an order of 1-8 seconds to less than 500 milliseconds. A safe and optimal solution is guaranteed by including a feasibility check in T-PDG before returning the final trajectory.

MAMar 28
GUIDE: Guided Updates for In-context Decision Evolution in LLM-Driven Spacecraft Operations

Alejandro Carrasco, Mariko Storey-Matsutani, Victor Rodriguez-Fernandez et al.

Large language models (LLMs) have been proposed as supervisory agents for spacecraft operations, but existing approaches rely on static prompting and do not improve across repeated executions. We introduce \textsc{GUIDE}, a non-parametric policy improvement framework that enables cross-episode adaptation without weight updates by evolving a structured, state-conditioned playbook of natural-language decision rules. A lightweight acting model performs real-time control, while offline reflection updates the playbook from prior trajectories. Evaluated on an adversarial orbital interception task in the Kerbal Space Program Differential Games environment, GUIDE's evolution consistently outperforms static baselines. Results indicate that context evolution in LLM agents functions as policy search over structured decision rules in real-time closed-loop spacecraft interaction.

SPACE-PHMar 30, 2024Code
Language Models are Spacecraft Operators

Victor Rodriguez-Fernandez, Alejandro Carrasco, Jason Cheng et al.

Recent trends are emerging in the use of Large Language Models (LLMs) as autonomous agents that take actions based on the content of the user text prompts. We intend to apply these concepts to the field of Guidance, Navigation, and Control in space, enabling LLMs to have a significant role in the decision-making process for autonomous satellite operations. As a first step towards this goal, we have developed a pure LLM-based solution for the Kerbal Space Program Differential Games (KSPDG) challenge, a public software design competition where participants create autonomous agents for maneuvering satellites involved in non-cooperative space operations, running on the KSP game engine. Our approach leverages prompt engineering, few-shot prompting, and fine-tuning techniques to create an effective LLM-based agent that ranked 2nd in the competition. To the best of our knowledge, this work pioneers the integration of LLM agents into space research. Code is available at https://github.com/ARCLab-MIT/kspdg.

LGNov 14, 2025
Multi-Phase Spacecraft Trajectory Optimization via Transformer-Based Reinforcement Learning

Amit Jain, Victor Rodriguez-Fernandez, Richard Linares

Autonomous spacecraft control for mission phases such as launch, ascent, stage separation, and orbit insertion remains a critical challenge due to the need for adaptive policies that generalize across dynamically distinct regimes. While reinforcement learning (RL) has shown promise in individual astrodynamics tasks, existing approaches often require separate policies for distinct mission phases, limiting adaptability and increasing operational complexity. This work introduces a transformer-based RL framework that unifies multi-phase trajectory optimization through a single policy architecture, leveraging the transformer's inherent capacity to model extended temporal contexts. Building on proximal policy optimization (PPO), our framework replaces conventional recurrent networks with a transformer encoder-decoder structure, enabling the agent to maintain coherent memory across mission phases spanning seconds to minutes during critical operations. By integrating a Gated Transformer-XL (GTrXL) architecture, the framework eliminates manual phase transitions while maintaining stability in control decisions. We validate our approach progressively: first demonstrating near-optimal performance on single-phase benchmarks (double integrator and Van der Pol oscillator), then extending to multiphase waypoint navigation variants, and finally tackling a complex multiphase rocket ascent problem that includes atmospheric flight, stage separation, and vacuum operations. Results demonstrate that the transformer-based framework not only matches analytical solutions in simple cases but also effectively learns coherent control policies across dynamically distinct regimes, establishing a foundation for scalable autonomous mission planning that reduces reliance on phase-specific controllers while maintaining compatibility with safety-critical verification protocols.

ROAug 13, 2025Code
BEAVR: Bimanual, multi-Embodiment, Accessible, Virtual Reality Teleoperation System for Robots

Alejandro Posadas-Nava, Alejandro Carrasco, Richard Linares

\textbf{BEAVR} is an open-source, bimanual, multi-embodiment Virtual Reality (VR) teleoperation system for robots, designed to unify real-time control, data recording, and policy learning across heterogeneous robotic platforms. BEAVR enables real-time, dexterous teleoperation using commodity VR hardware, supports modular integration with robots ranging from 7-DoF manipulators to full-body humanoids, and records synchronized multi-modal demonstrations directly in the LeRobot dataset schema. Our system features a zero-copy streaming architecture achieving $\leq$35\,ms latency, an asynchronous ``think--act'' control loop for scalable inference, and a flexible network API optimized for real-time, multi-robot operation. We benchmark BEAVR across diverse manipulation tasks and demonstrate its compatibility with leading visuomotor policies such as ACT, DiffusionPolicy, and SmolVLA. All code is publicly available, and datasets are released on Hugging Face\footnote{Code, datasets, and VR app available at https://github.com/ARCLab-MIT/BEAVR-Bot.

SPACE-PHJun 22, 2024Code
Enhancing Solar Driver Forecasting with Multivariate Transformers

Sergio Sanchez-Hurtado, Victor Rodriguez-Fernandez, Julia Briden et al.

In this work, we develop a comprehensive framework for F10.7, S10.7, M10.7, and Y10.7 solar driver forecasting with a time series Transformer (PatchTST). To ensure an equal representation of high and low levels of solar activity, we construct a custom loss function to weight samples based on the distance between the solar driver's historical distribution and the training set. The solar driver forecasting framework includes an 18-day lookback window and forecasts 6 days into the future. When benchmarked against the Space Environment Technologies (SET) dataset, our model consistently produces forecasts with a lower standard mean error in nearly all cases, with improved prediction accuracy during periods of high solar activity. All the code is available on Github https://github.com/ARCLab-MIT/sw-driver-forecaster.

LGDec 18, 2025
Tiny Recursive Control: Iterative Reasoning for Efficient Optimal Control

Amit Jain, Richard Linares

Neural network controllers increasingly demand millions of parameters, and language model approaches push into the billions. For embedded aerospace systems with strict power and latency constraints, this scaling is prohibitive. We present Tiny Recursive Control (TRC), a neural architecture based on a counterintuitive principle: capacity can emerge from iteration depth rather than parameter count. TRC applies compact networks (approximately 1.5M parameters) repeatedly through a two-level hierarchical latent structure, refining control sequences by simulating trajectories and correcting based on tracking error. Because the same weights process every refinement step, adding iterations increases computation without increasing memory. We evaluate TRC on nonlinear control problems including oscillator stabilization and powered descent with fuel constraints. Across these domains, TRC achieves near-optimal control costs while requiring only millisecond-scale inference on GPU and under 10~MB memory, two orders of magnitude smaller than language model baselines. These results demonstrate that recursive reasoning, previously confined to discrete tasks, transfers effectively to continuous control synthesis.

SPACE-PHJan 8, 2024
Towards a Machine Learning-Based Approach to Predict Space Object Density Distributions

Victor Rodriguez-Fernandez, Sumiyajav Sarangerel, Peng Mun Siew et al.

With the rapid increase in the number of Anthropogenic Space Objects (ASOs), Low Earth Orbit (LEO) is facing significant congestion, thereby posing challenges to space operators and risking the viability of the space environment for varied uses. Current models for examining this evolution, while detailed, are computationally demanding. To address these issues, we propose a novel machine learning-based model, as an extension of the MIT Orbital Capacity Tool (MOCAT). This advanced model is designed to accelerate the propagation of ASO density distributions, and it is trained on hundreds of simulations generated by an established and accurate model of the space environment evolution. We study how different deep learning-based solutions can potentially be good candidates for ASO propagation and manage the high-dimensionality of the data. To assess the model's capabilities, we conduct experiments in long term forecasting scenarios (around 100 years), analyze how and why the performance degrades over time, and discuss potential solutions to make this solution better.

ROJan 1, 2025
Diffusion Policies for Generative Modeling of Spacecraft Trajectories

Julia Briden, Breanna Johnson, Richard Linares et al.

Machine learning has demonstrated remarkable promise for solving the trajectory generation problem and in paving the way for online use of trajectory optimization for resource-constrained spacecraft. However, a key shortcoming in current machine learning-based methods for trajectory generation is that they require large datasets and even small changes to the original trajectory design requirements necessitate retraining new models to learn the parameter-to-solution mapping. In this work, we leverage compositional diffusion modeling to efficiently adapt out-of-distribution data and problem variations in a few-shot framework for 6 degree-of-freedom (DoF) powered descent trajectory generation. Unlike traditional deep learning methods that can only learn the underlying structure of one specific trajectory optimization problem, diffusion models are a powerful generative modeling framework that represents the solution as a probability density function (PDF) and this allows for the composition of PDFs encompassing a variety of trajectory design specifications and constraints. We demonstrate the capability of compositional diffusion models for inference-time 6 DoF minimum-fuel landing site selection and composable constraint representations. Using these samples as initial guesses for 6 DoF powered descent guidance enables dynamically feasible and computationally efficient trajectory generation.

AIJan 14, 2025
Visual Language Models as Operator Agents in the Space Domain

Alejandro Carrasco, Marco Nedungadi, Enrico M. Zucchelli et al.

This paper explores the application of Vision-Language Models (VLMs) as operator agents in the space domain, focusing on both software and hardware operational paradigms. Building on advances in Large Language Models (LLMs) and their multimodal extensions, we investigate how VLMs can enhance autonomous control and decision-making in space missions. In the software context, we employ VLMs within the Kerbal Space Program Differential Games (KSPDG) simulation environment, enabling the agent to interpret visual screenshots of the graphical user interface to perform complex orbital maneuvers. In the hardware context, we integrate VLMs with robotic systems equipped with cameras to inspect and diagnose physical space objects, such as satellites. Our results demonstrate that VLMs can effectively process visual and textual data to generate contextually appropriate actions, competing with traditional methods and non-multimodal LLMs in simulation tasks, and showing promise in real-world applications.

LGJan 28, 2025
Fine-Tuned Language Models as Space Systems Controllers

Enrico M. Zucchelli, Di Wu, Julia Briden et al.

Large language models (LLMs), or foundation models (FMs), are pretrained transformers that coherently complete sentences auto-regressively. In this paper, we show that LLMs can control simplified space systems after some additional training, called fine-tuning. We look at relatively small language models, ranging between 7 and 13 billion parameters. We focus on four problems: a three-dimensional spring toy problem, low-thrust orbit transfer, low-thrust cislunar control, and powered descent guidance. The fine-tuned LLMs are capable of controlling systems by generating sufficiently accurate outputs that are multi-dimensional vectors with up to 10 significant digits. We show that for several problems the amount of data required to perform fine-tuning is smaller than what is generally required of traditional deep neural networks (DNNs), and that fine-tuned LLMs are good at generalizing outside of the training dataset. Further, the same LLM can be fine-tuned with data from different problems, with only minor performance degradation with respect to LLMs trained for a single application. This work is intended as a first step towards the development of a general space systems controller.

OCJan 1, 2025
Tight Constraint Prediction of Six-Degree-of-Freedom Transformer-based Powered Descent Guidance

Julia Briden, Trey Gurga, Breanna Johnson et al.

This work introduces Transformer-based Successive Convexification (T-SCvx), an extension of Transformer-based Powered Descent Guidance (T-PDG), generalizable for efficient six-degree-of-freedom (DoF) fuel-optimal powered descent trajectory generation. Our approach significantly enhances the sample efficiency and solution quality for nonconvex-powered descent guidance by employing a rotation invariant transformation of the sampled dataset. T-PDG was previously applied to the 3-DoF minimum fuel powered descent guidance problem, improving solution times by up to an order of magnitude compared to lossless convexification (LCvx). By learning to predict the set of tight or active constraints at the optimal control problem's solution, Transformer-based Successive Convexification (T-SCvx) creates the minimal reduced-size problem initialized with only the tight constraints, then uses the solution of this reduced problem to warm-start the direct optimization solver. 6-DoF powered descent guidance is known to be challenging to solve quickly and reliably due to the nonlinear and non-convex nature of the problem, the discretization scheme heavily influencing solution validity, and reference trajectory initialization determining algorithm convergence or divergence. Our contributions in this work address these challenges by extending T-PDG to learn the set of tight constraints for the successive convexification (SCvx) formulation of the 6-DoF powered descent guidance problem. In addition to reducing the problem size, feasible and locally optimal reference trajectories are also learned to facilitate convergence from the initial guess. T-SCvx enables onboard computation of real-time guidance trajectories, demonstrated by a 6-DoF Mars powered landing application problem.

ROSep 4, 2025
Action Chunking with Transformers for Image-Based Spacecraft Guidance and Control

Alejandro Posadas-Nava, Andrea Scorsoglio, Luca Ghilardi et al.

We present an imitation learning approach for spacecraft guidance, navigation, and control(GNC) that achieves high performance from limited data. Using only 100 expert demonstrations, equivalent to 6,300 environment interactions, our method, which implements Action Chunking with Transformers (ACT), learns a control policy that maps visual and state observations to thrust and torque commands. ACT generates smoother, more consistent trajectories than a meta-reinforcement learning (meta-RL) baseline trained with 40 million interactions. We evaluate ACT on a rendezvous task: in-orbit docking with the International Space Station (ISS). We show that our approach achieves greater accuracy, smoother control, and greater sample efficiency.

CVAug 1, 2025
DreamSat-2.0: Towards a General Single-View Asteroid 3D Reconstruction

Santiago Diaz, Xinghui Hu, Josiane Uwumukiza et al.

To enhance asteroid exploration and autonomous spacecraft navigation, we introduce DreamSat-2.0, a pipeline that benchmarks three state-of-the-art 3D reconstruction models-Hunyuan-3D, Trellis-3D, and Ouroboros-3D-on custom spacecraft and asteroid datasets. Our systematic analysis, using 2D perceptual (image quality) and 3D geometric (shape accuracy) metrics, reveals that model performance is domain-dependent. While models produce higher-quality images of complex spacecraft, they achieve better geometric reconstructions for the simpler forms of asteroids. New benchmarks are established, with Hunyuan-3D achieving top perceptual scores on spacecraft but its best geometric accuracy on asteroids, marking a significant advance over our prior work.

RODec 11, 2021
Online Information-Aware Motion Planning with Inertial Parameter Learning for Robotic Free-Flyers

Monica Ekal, Keenan Albee, Brian Coltin et al.

Space free-flyers like the Astrobee robots currently operating aboard the International Space Station must operate with inherent system uncertainties. Parametric uncertainties like mass and moment of inertia are especially important to quantify in these safety-critical space systems and can change in scenarios such as on-orbit cargo movement, where unknown grappled payloads significantly change the system dynamics. Cautiously learning these uncertainties en route can potentially avoid time- and fuel-consuming pure system identification maneuvers. Recognizing this, this work proposes RATTLE, an online information-aware motion planning algorithm that explicitly weights parametric model-learning coupled with real-time replanning capability that can take advantage of improved system models. The method consists of a two-tiered (global and local) planner, a low-level model predictive controller, and an online parameter estimator that produces estimates of the robot's inertial properties for more informed control and replanning on-the-fly; all levels of the planning and control feature online update-able models. Simulation results of RATTLE for the Astrobee free-flyer grappling an uncertain payload are presented alongside results of a hardware demonstration showcasing the ability to explicitly encourage model parametric learning while achieving otherwise useful motion.

ROFeb 20, 2021
Safe and Uncertainty-Aware Robotic Motion Planning Techniques for Agile On-Orbit Assembly

Bryce Doerr, Keenan Albee, Monica Ekal et al.

As access to space and robotic autonomy capabilities move forward, there is simultaneously a growing interest in deploying large, complex space structures to provide new on-orbit capabilities. New space-borne observatories, large orbital outposts, and even futuristic on-orbit manufacturing will be enabled by robotic assembly of space structures using techniques like on-orbit additive manufacturing which can provide flexibility in constructing and even repairing complex hardware. However, the dynamics underlying the robotic assembler during manipulation may operate under inertial uncertainties. Thus, inertial estimation of the robot and the manipulated component system must be considered during structural assembly. The contribution of this work is to address both the motion planning and control for robotic assembly with consideration of the inertial estimation of the combined free-flying robotic assembler and additively manufactured component system. Specifically, the Linear Quadratic Regulator Rapidly-Exploring Randomized Trees (LQR-RRT*) and dynamically feasible path smoothing are used to obtain obstacle-free trajectories for the system. Further, model learning is incorporated explicitly into the planning stages via approximation of the continuous system and accompanying reward of performing safe, objective-oriented motion. Remaining uncertainty can then be dealt with using robust tube model predictive control. By obtaining controlled trajectories that consider both obstacle avoidance and learning of the inertial properties of the free-flyer and manipulated component system, the free-flyer rapidly considers and plans the construction of space structures with enhanced system knowledge. The approach naturally generalizes to repairing, refueling, and re-provisioning space structure components while providing optimal collision-free trajectories under e.g., inertial uncertainty.

ROAug 6, 2020
Motion Planning and Control for On-Orbit Assembly using LQR-RRT* and Nonlinear MPC

Bryce Doerr, Richard Linares

Deploying large, complex space structures is of great interest to the modern scientific world as it can provide new capabilities in obtaining scientific, communicative, and observational information. However, many theoretical mission designs contain complexities that must be constrained by the requirements of the launch vehicle, such as volume and mass. To mitigate such constraints, the use of on-orbit additive manufacturing and robotic assembly allows for the flexibility of building large complex structures including telescopes, space stations, and communication satellites. The contribution of this work is to develop motion planning and control algorithms using the linear quadratic regulator and rapidly-exploring randomized trees (LQR-RRT*), path smoothing, and tracking the trajectory using a closed-loop nonlinear receding horizon control optimizer for a robotic Astrobee free-flyer. By obtaining controlled trajectories that consider obstacle avoidance and dynamics of the vehicle and manipulator, the free-flyer rapidly considers and plans the construction of space structures. The approach is a natural generalization to repairing, refueling, and re-provisioning space structure components while providing optimal collision-free trajectories during operation.

SYApr 18, 2020
Reinforcement Meta-Learning for Interception of Maneuvering Exoatmospheric Targets with Parasitic Attitude Loop

Brian Gaudet, Roberto Furfaro, Richard Linares et al.

We use Reinforcement Meta-Learning to optimize an adaptive integrated guidance, navigation, and control system suitable for exoatmospheric interception of a maneuvering target. The system maps observations consisting of strapdown seeker angles and rate gyro measurements directly to thruster on-off commands. Using a high fidelity six degree-of-freedom simulator, we demonstrate that the optimized policy can adapt to parasitic effects including seeker angle measurement lag, thruster control lag, the parasitic attitude loop resulting from scale factor errors and Gaussian noise on angle and rotational velocity measurements, and a time varying center of mass caused by fuel consumption and slosh. Importantly, the optimized policy gives good performance over a wide range of challenging target maneuvers. Unlike previous work that enhances range observability by inducing line of sight oscillations, our system is optimized to use only measurements available from the seeker and rate gyros. Through extensive Monte Carlo simulation of randomized exoatmospheric interception scenarios, we demonstrate that the optimized policy gives performance close to that of augmented proportional navigation with perfect knowledge of the full engagement state. The optimized system is computationally efficient and requires minimal memory, and should be compatible with today's flight processors.

SYNov 16, 2019
Six Degree-of-Freedom Body-Fixed Hovering over Unmapped Asteroids via LIDAR Altimetry and Reinforcement Meta-Learning

Brian Gaudet, Richard Linares, Roberto Furfaro

We optimize a six degrees of freedom hovering policy using reinforcement meta-learning. The policy maps flash LIDAR measurements directly to on/off spacecraft body-frame thrust commands, allowing hovering at a fixed position and attitude in the asteroid body-fixed reference frame. Importantly, the policy does not require position and velocity estimates, and can operate in environments with unknown dynamics, and without an asteroid shape model or navigation aids. Indeed, during optimization the agent is confronted with a new randomly generated asteroid for each episode, insuring that it does not learn an asteroid's shape, texture, or environmental dynamics. This allows the deployed policy to generalize well to novel asteroid characteristics, which we demonstrate in our experiments. Moreover, our experiments show that the optimized policy adapts to actuator failure and sensor noise. Although the policy is optimized using randomly generated synthetic asteroids, it is tested on two shape models from actual asteroids: Bennu and Itokawa. We find that the policy generalizes well to these shape models. The hovering controller has the potential to simplify mission planning by allowing asteroid body-fixed hovering immediately upon the spacecraft's arrival to an asteroid. This in turn simplifies shape model generation and allows resource mapping via remote sensing immediately upon arrival at the target asteroid.

SYJul 13, 2019
Seeker based Adaptive Guidance via Reinforcement Meta-Learning Applied to Asteroid Close Proximity Operations

Brian Gaudet, Richard Linares, Roberto Furfaro

Current practice for asteroid close proximity maneuvers requires extremely accurate characterization of the environmental dynamics and precise spacecraft positioning prior to the maneuver. This creates a delay of several months between the spacecraft's arrival and the ability to safely complete close proximity maneuvers. In this work we develop an adaptive integrated guidance, navigation, and control system that can complete these maneuvers in environments with unknown dynamics, with initial conditions spanning a large deployment region, and without a shape model of the asteroid. The system is implemented as a policy optimized using reinforcement meta-learning. The spacecraft is equipped with an optical seeker that locks to either a terrain feature, back-scattered light from a targeting laser, or an active beacon, and the policy maps observations consisting of seeker angles and LIDAR range readings directly to engine thrust commands. The policy implements a recurrent network layer that allows the deployed policy to adapt real time to both environmental forces acting on the agent and internal disturbances such as actuator failure and center of mass variation. We validate the guidance system through simulated landing maneuvers in a six degrees-of-freedom simulator. The simulator randomizes the asteroid's characteristics such as solar radiation pressure, density, spin rate, and nutation angle, requiring the guidance and control system to adapt to the environment. We also demonstrate robustness to actuator failure, sensor bias, and changes in the spacecraft's center of mass and inertia tensor. Finally, we suggest a concept of operations for asteroid close proximity maneuvers that is compatible with the guidance system.

ROJun 6, 2019
Combining Parameter Identification and Trajectory Optimization: Real-time Planning for Information Gain

Keenan Albee, Monica Ekal, Rodrigo Ventura et al.

Robotic systems often operate with uncertainties in their dynamics, for example, unknown inertial properties. Broadly, there are two approaches for controlling uncertain systems: design robust controllers in spite of uncertainty, or characterize a system before attempting to control it. This paper proposes a middle-ground approach, making trajectory progress while also accounting for gaining information about the system. More specifically, it combines excitation trajectories which are usually intended to optimize information gain for an estimator, with goal-driven trajectory optimization metrics. For this purpose, a measure of information gain is incorporated (using the Fisher Information Matrix) in a real-time planning framework to produce trajectories favorable for estimation. At the same time, the planner receives stable parameter updates from the estimator, enhancing the system model. An implementation of this learn-as-you-go approach utilizing an Unscented Kalman Filter (UKF) and Nonlinear Model Predictive Controller (NMPC) is demonstrated in simulation. Results for cases with and without information gain and online parameter updates in the system model are presented.

SYApr 18, 2019
Adaptive Guidance and Integrated Navigation with Reinforcement Meta-Learning

Brian Gaudet, Richard Linares, Roberto Furfaro

This paper proposes a novel adaptive guidance system developed using reinforcement meta-learning with a recurrent policy and value function approximator. The use of recurrent network layers allows the deployed policy to adapt real time to environmental forces acting on the agent. We compare the performance of the DR/DV guidance law, an RL agent with a non-recurrent policy, and an RL agent with a recurrent policy in four challenging environments with unknown but highly variable dynamics. These tasks include a safe Mars landing with random engine failure and a landing on an asteroid with unknown environmental dynamics. We also demonstrate the ability of a RL meta-learning optimized policy to implement a guidance law using observations consisting of only Doppler radar altimeter readings in a Mars landing environment, and LIDAR altimeter readings in an asteroid landing environment, thus integrating guidance and navigation.

SYJan 12, 2019
Adaptive Guidance with Reinforcement Meta-Learning

Brian Gaudet, Richard Linares

This paper proposes a novel adaptive guidance system developed using reinforcement meta-learning with a recurrent policy and value function approximator. The use of recurrent network layers allows the deployed policy to adapt real time to environmental forces acting on the agent. We compare the performance of the DR/DV guidance law, an RL agent with a non-recurrent policy, and an RL agent with a recurrent policy in four difficult tasks with unknown but highly variable dynamics. These tasks include a safe Mars landing with random engine failure and a landing on an asteroid with unknown environmental dynamics. We also demonstrate the ability of a recurrent policy to navigate using only Doppler radar altimeter returns, thus integrating guidance and navigation.

LGJan 12, 2019
Learning Accurate Extended-Horizon Predictions of High Dimensional Trajectories

Brian Gaudet, Richard Linares, Roberto Furfaro

We present a novel predictive model architecture based on the principles of predictive coding that enables open loop prediction of future observations over extended horizons. There are two key innovations. First, whereas current methods typically learn to make long-horizon open-loop predictions using a multi-step cost function, we instead run the model open loop in the forward pass during training. Second, current predictive coding models initialize the representation layer's hidden state to a constant value at the start of an episode, and consequently typically require multiple steps of interaction with the environment before the model begins to produce accurate predictions. Instead, we learn a mapping from the first observation in an episode to the hidden state, allowing the trained model to immediately produce accurate predictions. We compare the performance of our architecture to a standard predictive coding model and demonstrate the ability of the model to make accurate long horizon open-loop predictions of simulated Doppler radar altimeter readings during a six degree of freedom Mars landing. Finally, we demonstrate a 2X reduction in sample complexity by using the model to implement a Dyna style algorithm to accelerate policy learning with proximal policy optimization.