Hamsa Balakrishnan

h-index33

8papers

3,886citations

Novelty57%

AI Score47

Ranked #31,405 of 194,257 authors (top 16%)#10 in MA (top 5%)

8 Papers

17.6MANov 3, 2022Code

Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation

Siddharth Nayak, Kenneth Choi, Wenqi Ding et al. · mit

We consider the problem of multi-agent navigation and collision avoidance when observations are limited to the local neighborhood of each agent. We propose InforMARL, a novel architecture for multi-agent reinforcement learning (MARL) which uses local information intelligently to compute paths for all the agents in a decentralized manner. Specifically, InforMARL aggregates information about the local neighborhood of agents for both the actor and the critic using a graph neural network and can be used in conjunction with any standard MARL algorithm. We show that (1) in training, InforMARL has better sample efficiency and performance than baseline approaches, despite using less information, and (2) in testing, it scales well to environments with arbitrary numbers of agents and obstacles. We illustrate these results using four task environments, including one with predetermined goals for each agent, and one in which the agents collectively try to cover all goals. Code available at https://github.com/nsidn98/InforMARL.

2.3MAFeb 13, 2025

Asynchronous Cooperative Multi-Agent Reinforcement Learning with Limited Communication

Sydney Dolan, Siddharth Nayak, Jasmine Jerry Aloor et al. · mit

We consider the problem setting in which multiple autonomous agents must cooperatively navigate and perform tasks in an unknown, communication-constrained environment. Traditional multi-agent reinforcement learning (MARL) approaches assume synchronous communications and perform poorly in such environments. We propose AsynCoMARL, an asynchronous MARL approach that uses graph transformers to learn communication protocols from dynamic graphs. AsynCoMARL can accommodate infrequent and asynchronous communications between agents, with edges of the graph only forming when agents communicate with each other. We show that AsynCoMARL achieves similar success and collision rates as leading baselines, despite 26\% fewer messages being passed between agents.

7.0DCMay 14

Designing Dense Satellite Clusters for Distributed Space-based Datacenters

Jules Pénot, Hamsa Balakrishnan

Recent proposals for datacenters in sun-synchronous Low Earth Orbit rely on a large number of compute satellites formation-flying in dense clusters. Designing such satellite clusters requires optimizing the satellites' orbital geometry under several safety and operational constraints applied throughout the cluster's entire orbit. These constraints include guaranteeing a minimum inter-satellite spacing, obstruction-less solar power for every satellite, and that each satellite have a stable set of nearest neighbors with which it can maintain inter-satellite links (ISLs). In this work, we propose two main cluster orbital designs, parametrized by the minimum inter-satellite spacing $R_{min}$ and the cluster radius $R_{max}$: a planar cluster, and a 3D cluster. We show by construction and numerical analysis that both cluster orbital designs are consistent with the inter-satellite spacing, unobstructed sun-vector, and inter-satellite line of sight constraints. The proposed planar architecture is the most efficient packing of satellites in a plane for given $R_{min}$ and $R_{max}$ values, and our 3D architecture allows for the number of datacenter satellites to scale proportional to $(R_{max}/R_{min})^3$, an improvement over all previous LEO datacenter cluster designs. Finally, for a given satellite cluster, we formulate and solve an integer optimization problem that maps a VL2-like Clos network datacenter switching fabric onto the satellites and their corresponding set of feasible ISLs. We confirm that for both the planar and 3D architectures, there are sufficiently many permanently unobstructed ISLs within the cluster to replicate the switching fabric of terrestrial datacenters. We also examine the tradeoff between the number of ISLs each satellite can simultaneously sustain, and the corresponding number of cluster satellites that must be dedicated as aggregation and intermediate switches.

3.2MAJun 22

Decentralized Coordination of Autonomous Traffic Through Advanced Air Mobility Corridors

Jasmine Jerry Aloor, Hamsa Balakrishnan

The use of dedicated corridors for Advanced Air Mobility (AAM) traffic is one of the most commonly proposed pathways to integrating them into existing airspace operations. Most prior research has focused on the design of networks of AAM corridors and conflict resolution for aircraft within corridors. It is also generally believed that while attractive from an implementation perspective, corridor-based operations may be inefficient, especially in the absence of centralized traffic management. In this paper, we show that contrary to this belief, it is possible for autonomous aircraft to learn to self-organize into corridor flows in decentralized settings. We illustrate our approach using scenarios in which fixed-wing aircraft need to safely and efficiently traverse (1) a single corridor with metering after the exit, (2) a sequence of two consecutive corridors, and (3) a corridor that splits into two. We find that in decentralized settings with only local information, the aircraft are able to conform to the corridor boundaries more than 94% of the time and reach their goal in a relatively efficient manner. Furthermore, tactical interventions to handle violations of the separation minimum are needed only infrequently in low- and medium-density settings. However, such tactical interventions become more frequently necessary only when traffic density is high.

3.5MAJun 22

Decentralized Autonomous Traffic Management through Corridor Networks

Jasmine Jerry Aloor, Aadarsh Govada, Hamsa Balakrishnan

As autonomous aircraft are introduced at scale and traffic density increases, centralized management becomes insufficient to coordinate the large numbers of crewed and uncrewed aircraft. Dedicated Advanced Air Mobility (AAM) corridors have therefore been proposed for organizing high-density autonomous traffic flows. The desire to scalably provide autonomous aircraft flexibility in trajectory planning motivates the development of decentralized approaches to traffic management in AAM corridors. In this work, we extend a multi-agent reinforcement learning (MARL) approach to address the challenge of decentralized traffic flow management in air corridor networks. We test policies trained in a single-corridor setting on increasingly complex multi-corridor networks with combinations of merges and splits in a zero-shot manner. Experimental results demonstrate that learned behaviors transfer well to scenarios with varying traffic density, network geometry, and heterogeneous vehicle performance, without needing centralized coordination or model retraining. We evaluate system-level performance in terms of conformance to corridor boundaries, completion rates, average speeds, distance traveled, and maintenance of inter-aircraft separation. We find that although our policies require only locally coordinated entry, traversal, and exit behaviors, they collectively produce desirable traffic flows through the corridor network.

5.1MAOct 19, 2024Code

Cooperation and Fairness in Multi-Agent Reinforcement Learning

Jasmine Jerry Aloor, Siddharth Nayak, Sydney Dolan et al. · mit

Multi-agent systems are trained to maximize shared cost objectives, which typically reflect system-level efficiency. However, in the resource-constrained environments of mobility and transportation systems, efficiency may be achieved at the expense of fairness -- certain agents may incur significantly greater costs or lower rewards compared to others. Tasks could be distributed inequitably, leading to some agents receiving an unfair advantage while others incur disproportionately high costs. It is important to consider the tradeoffs between efficiency and fairness. We consider the problem of fair multi-agent navigation for a group of decentralized agents using multi-agent reinforcement learning (MARL). We consider the reciprocal of the coefficient of variation of the distances traveled by different agents as a measure of fairness and investigate whether agents can learn to be fair without significantly sacrificing efficiency (i.e., increasing the total distance traveled). We find that by training agents using min-max fair distance goal assignments along with a reward term that incentivizes fairness as they move towards their goals, the agents (1) learn a fair assignment of goals and (2) achieve almost perfect goal coverage in navigation scenarios using only local observations. For goal coverage scenarios, we find that, on average, our model yields a 14% improvement in efficiency and a 5% improvement in fairness over a baseline trained using random assignments. Furthermore, an average of 21% improvement in fairness can be achieved compared to a model trained on optimally efficient assignments; this increase in fairness comes at the expense of only a 7% decrease in efficiency. Finally, we extend our method to environments in which agents must complete coverage tasks in prescribed formations and show that it is possible to do so without tailoring the models to specific formation shapes.

3.1LGSep 24, 2021Code

NICE: Robust Scheduling through Reinforcement Learning-Guided Integer Programming

Luke Kenworthy, Siddharth Nayak, Christopher Chin et al.

Integer programs provide a powerful abstraction for representing a wide range of real-world scheduling problems. Despite their ability to model general scheduling problems, solving large-scale integer programs (IP) remains a computational challenge in practice. The incorporation of more complex objectives such as robustness to disruptions further exacerbates the computational challenge. We present NICE (Neural network IP Coefficient Extraction), a novel technique that combines reinforcement learning and integer programming to tackle the problem of robust scheduling. More specifically, NICE uses reinforcement learning to approximately represent complex objectives in an integer programming formulation. We use NICE to determine assignments of pilots to a flight crew schedule so as to reduce the impact of disruptions. We compare NICE with (1) a baseline integer programming formulation that produces a feasible crew schedule, and (2) a robust integer programming formulation that explicitly tries to minimize the impact of disruptions. Our experiments show that, across a variety of scenarios, NICE produces schedules resulting in 33% to 48% fewer disruptions than the baseline formulation. Moreover, in more severely constrained scheduling scenarios in which the robust integer program fails to produce a schedule within 90 minutes, NICE is able to build robust schedules in less than 2 seconds on average.

2.3CYMay 25, 2021

Throughput-Fairness Tradeoffs in Mobility Platforms

Arjun Balasingam, Karthik Gopalakrishnan, Radhika Mittal et al.

This paper studies the problem of allocating tasks from different customers to vehicles in mobility platforms, which are used for applications like food and package delivery, ridesharing, and mobile sensing. A mobility platform should allocate tasks to vehicles and schedule them in order to optimize both throughput and fairness across customers. However, existing approaches to scheduling tasks in mobility platforms ignore fairness. We introduce Mobius, a system that uses guided optimization to achieve both high throughput and fairness across customers. Mobius supports spatiotemporally diverse and dynamic customer demands. It provides a principled method to navigate inherent tradeoffs between fairness and throughput caused by shared mobility. Our evaluation demonstrates these properties, along with the versatility and scalability of Mobius, using traces gathered from ridesharing and aerial sensing applications. Our ridesharing case study shows that Mobius can schedule more than 16,000 tasks across 40 customers and 200 vehicles in an online manner.