Steven Morad

LG
h-index30
17papers
200citations
Novelty48%
AI Score51

17 Papers

54.7LGMay 27Code
Investigating Memory in Model-Free RL with POPGym Arcade

Zekang Wang, Zhe He, Borong Zhang et al.

How should we analyze memory in deep RL? We introduce tools for analyzing policies under partial observability and revealing how agents use memory to make decisions. To utilize these tools, we present POPGym Arcade, a collection of Atari-inspired, hardware-accelerated environments sharing a single observation and action space. Each environment provides fully and partially observable variants, enabling counterfactual studies on observability. We find that controlled studies are necessary for fair comparisons and identify a pathology where value functions smear credit over irrelevant history. Using this pathology, we demonstrate how out-of-distribution scenarios can contaminate memory, perturbing the policy far into the future. Our code is available at https://github.com/bolt-research/popgym-arcade.

LGMar 3, 2023Code
POPGym: Benchmarking Partially Observable Reinforcement Learning

Steven Morad, Ryan Kortvelesy, Matteo Bettini et al. · cambridge

Real world applications of Reinforcement Learning (RL) are often partially observable, thus requiring memory. Despite this, partial observability is still largely ignored by contemporary RL benchmarks and libraries. We introduce Partially Observable Process Gym (POPGym), a two-part library containing (1) a diverse collection of 15 partially observable environments, each with multiple difficulties and (2) implementations of 13 memory model baselines -- the most in a single RL library. Existing partially observable benchmarks tend to fixate on 3D visual navigation, which is computationally expensive and only one type of POMDP. In contrast, POPGym environments are diverse, produce smaller observations, use less memory, and often converge within two hours of training on a consumer-grade GPU. We implement our high-level memory API and memory baselines on top of the popular RLlib framework, providing plug-and-play compatibility with various training algorithms, exploration strategies, and distributed training paradigms. Using POPGym, we execute the largest comparison across RL memory models to date. POPGym is available at https://github.com/proroklab/popgym.

LGOct 6, 2023Code
Reinforcement Learning with Fast and Forgetful Memory

Steven Morad, Ryan Kortvelesy, Stephan Liwicki et al. · cambridge

Nearly all real world tasks are inherently partially observable, necessitating the use of memory in Reinforcement Learning (RL). Most model-free approaches summarize the trajectory into a latent Markov state using memory models borrowed from Supervised Learning (SL), even though RL tends to exhibit different training and efficiency characteristics. Addressing this discrepancy, we introduce Fast and Forgetful Memory, an algorithm-agnostic memory model designed specifically for RL. Our approach constrains the model search space via strong structural priors inspired by computational psychology. It is a drop-in replacement for recurrent neural networks (RNNs) in recurrent RL algorithms, achieving greater reward than RNNs across various recurrent benchmarks and algorithms without changing any hyperparameters. Moreover, Fast and Forgetful Memory exhibits training speeds two orders of magnitude faster than RNNs, attributed to its logarithmic time and linear space complexity. Our implementation is available at https://github.com/proroklab/ffm.

LGJun 24, 2023
Generalised f-Mean Aggregation for Graph Neural Networks

Ryan Kortvelesy, Steven Morad, Amanda Prorok · cambridge

Graph Neural Network (GNN) architectures are defined by their implementations of update and aggregation modules. While many works focus on new ways to parametrise the update modules, the aggregation modules receive comparatively little attention. Because it is difficult to parametrise aggregation functions, currently most methods select a ``standard aggregator'' such as $\mathrm{mean}$, $\mathrm{sum}$, or $\mathrm{max}$. While this selection is often made without any reasoning, it has been shown that the choice in aggregator has a significant impact on performance, and the best choice in aggregator is problem-dependent. Since aggregation is a lossy operation, it is crucial to select the most appropriate aggregator in order to minimise information loss. In this paper, we present GenAgg, a generalised aggregation operator, which parametrises a function space that includes all standard aggregators. In our experiments, we show that GenAgg is able to represent the standard aggregators with much higher accuracy than baseline methods. We also show that using GenAgg as a drop-in replacement for an existing aggregator in a GNN often leads to a significant boost in performance across various tasks.

LGFeb 24, 2023
Permutation-Invariant Set Autoencoders with Fixed-Size Embeddings for Multi-Agent Learning

Ryan Kortvelesy, Steven Morad, Amanda Prorok · cambridge

The problem of permutation-invariant learning over set representations is particularly relevant in the field of multi-agent systems -- a few potential applications include unsupervised training of aggregation functions in graph neural networks (GNNs), neural cellular automata on graphs, and prediction of scenes with multiple objects. Yet existing approaches to set encoding and decoding tasks present a host of issues, including non-permutation-invariance, fixed-length outputs, reliance on iterative methods, non-deterministic outputs, computationally expensive loss functions, and poor reconstruction accuracy. In this paper we introduce a Permutation-Invariant Set Autoencoder (PISA), which tackles these problems and produces encodings with significantly lower reconstruction error than existing baselines. PISA also provides other desirable properties, including a similarity-preserving latent space, and the ability to insert or remove elements from the encoding. After evaluating PISA against baseline methods, we demonstrate its usefulness in a multi-agent application. Using PISA as a subcomponent, we introduce a novel GNN architecture which serves as a generalised communication scheme, allowing agents to use communication to gain full observability of a system.

ROJul 29, 2024
Language-Conditioned Offline RL for Multi-Robot Navigation

Steven Morad, Ajay Shankar, Jan Blumenkamp et al.

We present a method for developing navigation policies for multi-robot teams that interpret and follow natural language instructions. We condition these policies on embeddings from pretrained Large Language Models (LLMs), and train them via offline reinforcement learning with as little as 20 minutes of randomly-collected data. Experiments on a team of five real robots show that these policies generalize well to unseen commands, indicating an understanding of the LLM latent space. Our method requires no simulators or environment models, and produces low-latency control policies that can be deployed directly to real robots without finetuning. We provide videos of our experiments at https://sites.google.com/view/llm-marl.

54.7ROMar 27
120 Minutes and a Laptop: Minimalist Image-goal Navigation via Unsupervised Exploration and Offline RL

Xiaoming Liu, Borong Zhang, Qingbiao Li et al.

The prevailing paradigm for image-goal visual navigation often assumes access to large-scale datasets, substantial pretraining, and significant computational resources. In this work, we challenge this assumption. We show that we can collect a dataset, train an in-domain policy, and deploy it to the real world (1) in less than 120 minutes, (2) on a consumer laptop, (3) without any human intervention. Our method, MINav, formulates image-goal navigation as an offline goal-conditioned reinforcement learning problem, combining unsupervised data collection with hindsight goal relabeling and offline policy learning. Experiments in simulation and the real world show that MINav improves exploration efficiency, outperforms zero-shot navigation baselines in target environments, and scales favorably with dataset size. These results suggest that effective real-world robotic learning can be achieved with high computational efficiency, lowering the barrier to rapid policy prototyping and deployment.

MAMar 11, 2024Code
Generalising Multi-Agent Cooperation through Task-Agnostic Communication

Dulhan Jayalath, Steven Morad, Amanda Prorok

Existing communication methods for multi-agent reinforcement learning (MARL) in cooperative multi-robot problems are almost exclusively task-specific, training new communication strategies for each unique task. We address this inefficiency by introducing a communication strategy applicable to any task within a given environment. We pre-train the communication strategy without task-specific reward guidance in a self-supervised manner using a set autoencoder. Our objective is to learn a fixed-size latent Markov state from a variable number of agent observations. Under mild assumptions, we prove that policies using our latent representations are guaranteed to converge, and upper bound the value error introduced by our Markov state approximation. Our method enables seamless adaptation to novel tasks without fine-tuning the communication strategy, gracefully supports scaling to more agents than present during training, and detects out-of-distribution events in an environment. Empirical results on diverse MARL scenarios validate the effectiveness of our approach, surpassing task-specific communication strategies in unseen tasks. Our implementation of this work is available at https://github.com/proroklab/task-agnostic-comms.

RONov 2, 2021Code
A Framework for Real-World Multi-Robot Systems Running Decentralized GNN-Based Policies

Jan Blumenkamp, Steven Morad, Jennifer Gielis et al.

GNNs are a paradigm-shifting neural architecture to facilitate the learning of complex multi-agent behaviors. Recent work has demonstrated remarkable performance in tasks such as flocking, multi-agent path planning and cooperative coverage. However, the policies derived through GNN-based learning schemes have not yet been deployed to the real-world on physical multi-robot systems. In this work, we present the design of a system that allows for fully decentralized execution of GNN-based policies. We create a framework based on ROS2 and elaborate its details in this paper. We demonstrate our framework on a case-study that requires tight coordination between robots, and present first-of-a-kind results that show successful real-world deployment of GNN-based policies on a decentralized multi-robot system relying on Adhoc communication. A video demonstration of this case-study, as well as the accompanying source code repository, can be found online. https://www.youtube.com/watch?v=COh-WLn4iO4 https://github.com/proroklab/ros2_multi_agent_passage https://github.com/proroklab/rl_multi_agent_passage

LGFeb 11
Learning Mixture Density via Natural Gradient Expectation Maximization

Yutao Chen, Jasmine Bayrooti, Steven Morad

Mixture density networks are neural networks that produce Gaussian mixtures to represent continuous multimodal conditional densities. Standard training procedures involve maximum likelihood estimation using the negative log-likelihood (NLL) objective, which suffers from slow convergence and mode collapse. In this work, we improve the optimization of mixture density networks by integrating their information geometry. Specifically, we interpret mixture density networks as deep latent-variable models and analyze them through an expectation maximization framework, which reveals surprising theoretical connections to natural gradient descent. We then exploit such connections to derive the natural gradient expectation maximization (nGEM) objective. We show that empirically nGEM achieves up to 10$\times$ faster convergence while adding almost zerocomputational overhead, and scales well to high-dimensional data where NLL otherwise fails.

LGFeb 15, 2024
Recurrent Reinforcement Learning with Memoroids

Steven Morad, Chris Lu, Ryan Kortvelesy et al.

Memory models such as Recurrent Neural Networks (RNNs) and Transformers address Partially Observable Markov Decision Processes (POMDPs) by mapping trajectories to latent Markov states. Neither model scales particularly well to long sequences, especially compared to an emerging class of memory models called Linear Recurrent Models. We discover that the recurrent update of these models resembles a monoid, leading us to reformulate existing models using a novel monoid-based framework that we call memoroids. We revisit the traditional approach to batching in recurrent reinforcement learning, highlighting theoretical and empirical deficiencies. We leverage memoroids to propose a batching method that improves sample efficiency, increases the return, and simplifies the implementation of recurrent loss functions in reinforcement learning.

LGMar 3, 2025
Investigating Memory in RL with POPGym Arcade

Zekang Wang, Zhe He, Borong Zhang et al.

How should we analyze memory in deep RL? We introduce mathematical tools for fairly analyzing policies under partial observability and revealing how agents use memory to make decisions. To utilize these tools, we present POPGym Arcade, a collection of Atari-inspired, hardware-accelerated, pixel-based environments sharing a single observation and action space. Each environment provides fully and partially observable variants, enabling counterfactual studies on observability. We find that controlled studies are necessary for fair comparisons, and identify a pathology where value functions smear credit over irrelevant history. With this pathology, we demonstrate how out-of-distribution scenarios can contaminate memory, perturbing the policy far into the future, with implications for sim-to-real transfer and offline RL.

IMSep 2, 2019
Building Small-Satellites to Live Through the Kessler Effect

Steven Morad, Himangshu Kalita, Ravi teja Nallapu et al.

The rapid advancement and miniaturization of spacecraft electronics, sensors, actuators, and power systems have resulted in growing proliferation of small-spacecraft. Coupled with this is the growing number of rocket launches, with left-over debris marking their trail. The space debris problem has also been compounded by test of several satellite killer missiles that have left large remnant debris fields. In this paper, we assume a future in which the Kessler Effect has taken hold and analyze the implications on the design of small-satellites and CubeSats. We use a multiprong approach of surveying the latest technologies, including the ability to sense space debris in orbit, perform obstacle avoidance, have sufficient shielding to take on small impacts and other techniques to mitigate the problem. Detecting and tracking space debris threats on-orbit is expected to be an important approach and we will analyze the latest vision algorithms to perform the detection, followed by quick reaction control systems to perform the avoidance. Alternately there may be scenarios where the debris is too small to track and avoid. In this case, the spacecraft will need passive mitigation measures to survive the impact. Based on these conditions, we develop a strawman design of a small spacecraft to mitigate these challenges. Based upon this study, we identify if there is sufficient present-day COTS technology to mitigate or shield satellites from the problem. We conclude by outlining technology pathways that need to be advanced now to best prepare ourselves for the worst-case eventuality of Kessler Effect taking hold in the upper altitudes of Low Earth Orbit.

ROMar 19, 2019
A Spring Propelled Extreme Environment Robot for Off-World Cave Exploration

Steven Morad, Thomas Dailey, Leonard Vance et al.

Pits on the Moon and Mars are intriguing geological formations that have yet to be explored. These geological formations can provide protection from harsh diurnal temperature variations, ionizing radiation, and meteorite impacts. Some have proposed that these underground formations are well-suited as human outposts. Some theorize that the Martian pits may harbor remnants of past life. Unfortunately, these geo-logical formations have been off-limits to conventional wheeled rovers and lander systems due to their collapsed ceiling or 'skylight' entrances. In this paper, a new low-cost method to explore these pits is presented using the Spring Propelled Extreme Environment Robot (SPEER). The SPEER consists of a launch system that flings disposable spherical microbots through skylights into the pits. The microbots are low-cost and composed of aluminium Al-6061 disposable spheres with an array of adapted COTS sensors and a solid rocket motor for soft landing.By moving most control authority to the launcher, the microbots become very simple, lightweight, and low-cost. We present a preliminary design of the microbots that can be built today using commercial components for under 500 USD. The microbots have a total mass of 1 kg, with more than 750 g available for a science instrument. In this paper, we present the design, dynamics and control, and operation of these microbots. This is followed by initial feasibility studies of the SPEER system by simulating exploration of a known Lunar pit in Mare Tranquillitatis.

RODec 30, 2018
Coordination and Control of Multiple Climbing Robots in Transport of Heavy Loads through Extreme Terrain

Himangshu Kalita, Steven Morad, Jekan Thangavelautham

The discovery of ice deposits in the permanently shadowed craters of the lunar North and South Pole Moon presents an important opportunity for In-Situ Resource Utilization. These ice deposits maybe the source for sustaining a lunar base or for enabling an interplanetary refueling station. These ice deposits also preserve a unique record of the geology and environment of their hosts, both in terms of impact history and the supply of volatile compounds, and so are of immense scientific interest. To date, these ice deposits have been studied indirectly and by remote active radar, but they need to be analyzed in-situ by robotic systems that can study the depths of the deposits, their purity and composition. However, these shadowed craters never see sunlight and are one of the coldest places in the solar system. NASA JPL proposed use of solar reflectors mounted on crater rims to project sunlight into the crater depths for use by ground robots. The solar reflectors would heat the crater base and vehicles positioned at the base sufficiently to survive the cold-temperatures. Our approach analyzes part of the logistics of the approach, with teams of robots climbing up and down to the crater to access the ice deposits. The mission will require robots to climb down extreme environments and carry large structures, including instruments and communication devices.

ROMar 15, 2018
Planning and Navigation of Climbing Robots in Low-Gravity Environments

Steven Morad, Himangshu Kalita, Jekan Thangavelautham

Advances in planetary robotics have led to wheeled robots that have beamed back invaluable science data from the surface of the Moon and Mars. However, these large wheeled robots are unable to access rugged environments such as cliffs, canyons and crater walls that contain exposed rock-faces and are geological time-capsules into the early Moon and Mars. We have proposed the SphereX robot with a mass of 3 kg, 30 cm diameter that can hop, roll and fly short distances. A single robot may slip and fall, however, a multirobot system can work cooperatively by being interlinked using spring-tethers and work much like a team of mountaineers to systematically climb a slope. We consider a team of four or more robots that are interlinked with tethers in an 'x' configuration. Each robot secures itself to a slope using spiny gripping actuators, and one by one each robot moves upwards by crawling, rolling or hopping up the slope. In this paper, we present a human devised autonomous climbing algorithm and evaluate it using a high-fidelity dynamics simulator. The climbing surfaces contain impassable obstacles and some loosely held rocks that can dislodge. Under these conditions, the robots need to autonomously map, plan and navigate up or down these steep environments. Autonomous mapping and navigation capability is evaluated using simulated lasers, vision sensors. The human devised planning algorithm uses a new algorithm called bounded-leg A*. Our early simulation results show much promise in these techniques and our future plans include demonstration on real robots in a controlled laboratory environment and outdoors in the canyons of Arizona.

ROMar 7, 2018
Path Planning and Navigation Inside Off-World Lava Tubes and Caves

Himangshu Kalita, Steven Morad, Jekan Thangavelautham

Detailed surface images of the Moon and Mars reveal hundreds of cave-like openings. These cave-like openings are theorized to be remnants of lava-tubes and their interior maybe in pristine conditions. These locations may have well preserved geological records of the Moon and Mars, including evidence of past water flow and habitability. Exploration of these caves using wheeled rovers remains a daunting challenge. These caves are likely to have entrances with caved-in ceilings much like the lava-tubes of Arizona and New Mexico. Thus, the entrances are nearly impossible to traverse even for experienced human hikers. Our approach is to utilize the SphereX robot, a 3 kg, 30 cm diameter robot with computer hardware and sensors of a smartphone attached to rocket thrusters. Each SphereX robot can hop, roll or fly short distances in low gravity, airless or low-pressure environments. Several SphereX robots maybe deployed to minimize single-point failure and exploit cooperative behaviors to traverse the cave. There are some important challenges for navigation and path planning in these cave environments. Localization systems such as GPS are not available nor are they easy to install due to the signal blockage from the rocks. These caves are too dark and too large for conventional sensor such as cameras and miniature laser sensors to perform detailed mapping and navigation. In this paper, we identify new techniques to map these caves by performing localized, cooperative mapping and navigation.