Hamsa Balakrishnan

MA
h-index7
6papers
105citations
Novelty57%
AI Score44

6 Papers

MANov 3, 2022Code
Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation

Siddharth Nayak, Kenneth Choi, Wenqi Ding et al. · mit

We consider the problem of multi-agent navigation and collision avoidance when observations are limited to the local neighborhood of each agent. We propose InforMARL, a novel architecture for multi-agent reinforcement learning (MARL) which uses local information intelligently to compute paths for all the agents in a decentralized manner. Specifically, InforMARL aggregates information about the local neighborhood of agents for both the actor and the critic using a graph neural network and can be used in conjunction with any standard MARL algorithm. We show that (1) in training, InforMARL has better sample efficiency and performance than baseline approaches, despite using less information, and (2) in testing, it scales well to environments with arbitrary numbers of agents and obstacles. We illustrate these results using four task environments, including one with predetermined goals for each agent, and one in which the agents collectively try to cover all goals. Code available at https://github.com/nsidn98/InforMARL.

MAFeb 13, 2025
Asynchronous Cooperative Multi-Agent Reinforcement Learning with Limited Communication

Sydney Dolan, Siddharth Nayak, Jasmine Jerry Aloor et al. · mit

We consider the problem setting in which multiple autonomous agents must cooperatively navigate and perform tasks in an unknown, communication-constrained environment. Traditional multi-agent reinforcement learning (MARL) approaches assume synchronous communications and perform poorly in such environments. We propose AsynCoMARL, an asynchronous MARL approach that uses graph transformers to learn communication protocols from dynamic graphs. AsynCoMARL can accommodate infrequent and asynchronous communications between agents, with edges of the graph only forming when agents communicate with each other. We show that AsynCoMARL achieves similar success and collision rates as leading baselines, despite 26\% fewer messages being passed between agents.

DCMay 14
Designing Dense Satellite Clusters for Distributed Space-based Datacenters

Jules Pénot, Hamsa Balakrishnan

Recent proposals for datacenters in sun-synchronous Low Earth Orbit rely on a large number of compute satellites formation-flying in dense clusters. Designing such satellite clusters requires optimizing the satellites' orbital geometry under several safety and operational constraints applied throughout the cluster's entire orbit. These constraints include guaranteeing a minimum inter-satellite spacing, obstruction-less solar power for every satellite, and that each satellite have a stable set of nearest neighbors with which it can maintain inter-satellite links (ISLs). In this work, we propose two main cluster orbital designs, parametrized by the minimum inter-satellite spacing $R_{min}$ and the cluster radius $R_{max}$: a planar cluster, and a 3D cluster. We show by construction and numerical analysis that both cluster orbital designs are consistent with the inter-satellite spacing, unobstructed sun-vector, and inter-satellite line of sight constraints. The proposed planar architecture is the most efficient packing of satellites in a plane for given $R_{min}$ and $R_{max}$ values, and our 3D architecture allows for the number of datacenter satellites to scale proportional to $(R_{max}/R_{min})^3$, an improvement over all previous LEO datacenter cluster designs. Finally, for a given satellite cluster, we formulate and solve an integer optimization problem that maps a VL2-like Clos network datacenter switching fabric onto the satellites and their corresponding set of feasible ISLs. We confirm that for both the planar and 3D architectures, there are sufficiently many permanently unobstructed ISLs within the cluster to replicate the switching fabric of terrestrial datacenters. We also examine the tradeoff between the number of ISLs each satellite can simultaneously sustain, and the corresponding number of cluster satellites that must be dedicated as aggregation and intermediate switches.

MAOct 19, 2024
Cooperation and Fairness in Multi-Agent Reinforcement Learning

Jasmine Jerry Aloor, Siddharth Nayak, Sydney Dolan et al. · mit

Multi-agent systems are trained to maximize shared cost objectives, which typically reflect system-level efficiency. However, in the resource-constrained environments of mobility and transportation systems, efficiency may be achieved at the expense of fairness -- certain agents may incur significantly greater costs or lower rewards compared to others. Tasks could be distributed inequitably, leading to some agents receiving an unfair advantage while others incur disproportionately high costs. It is important to consider the tradeoffs between efficiency and fairness. We consider the problem of fair multi-agent navigation for a group of decentralized agents using multi-agent reinforcement learning (MARL). We consider the reciprocal of the coefficient of variation of the distances traveled by different agents as a measure of fairness and investigate whether agents can learn to be fair without significantly sacrificing efficiency (i.e., increasing the total distance traveled). We find that by training agents using min-max fair distance goal assignments along with a reward term that incentivizes fairness as they move towards their goals, the agents (1) learn a fair assignment of goals and (2) achieve almost perfect goal coverage in navigation scenarios using only local observations. For goal coverage scenarios, we find that, on average, our model yields a 14% improvement in efficiency and a 5% improvement in fairness over a baseline trained using random assignments. Furthermore, an average of 21% improvement in fairness can be achieved compared to a model trained on optimally efficient assignments; this increase in fairness comes at the expense of only a 7% decrease in efficiency. Finally, we extend our method to environments in which agents must complete coverage tasks in prescribed formations and show that it is possible to do so without tailoring the models to specific formation shapes.

LGSep 24, 2021
NICE: Robust Scheduling through Reinforcement Learning-Guided Integer Programming

Luke Kenworthy, Siddharth Nayak, Christopher Chin et al.

Integer programs provide a powerful abstraction for representing a wide range of real-world scheduling problems. Despite their ability to model general scheduling problems, solving large-scale integer programs (IP) remains a computational challenge in practice. The incorporation of more complex objectives such as robustness to disruptions further exacerbates the computational challenge. We present NICE (Neural network IP Coefficient Extraction), a novel technique that combines reinforcement learning and integer programming to tackle the problem of robust scheduling. More specifically, NICE uses reinforcement learning to approximately represent complex objectives in an integer programming formulation. We use NICE to determine assignments of pilots to a flight crew schedule so as to reduce the impact of disruptions. We compare NICE with (1) a baseline integer programming formulation that produces a feasible crew schedule, and (2) a robust integer programming formulation that explicitly tries to minimize the impact of disruptions. Our experiments show that, across a variety of scenarios, NICE produces schedules resulting in 33% to 48% fewer disruptions than the baseline formulation. Moreover, in more severely constrained scheduling scenarios in which the robust integer program fails to produce a schedule within 90 minutes, NICE is able to build robust schedules in less than 2 seconds on average.

CYMay 25, 2021
Throughput-Fairness Tradeoffs in Mobility Platforms

Arjun Balasingam, Karthik Gopalakrishnan, Radhika Mittal et al.

This paper studies the problem of allocating tasks from different customers to vehicles in mobility platforms, which are used for applications like food and package delivery, ridesharing, and mobile sensing. A mobility platform should allocate tasks to vehicles and schedule them in order to optimize both throughput and fairness across customers. However, existing approaches to scheduling tasks in mobility platforms ignore fairness. We introduce Mobius, a system that uses guided optimization to achieve both high throughput and fairness across customers. Mobius supports spatiotemporally diverse and dynamic customer demands. It provides a principled method to navigate inherent tradeoffs between fairness and throughput caused by shared mobility. Our evaluation demonstrates these properties, along with the versatility and scalability of Mobius, using traces gathered from ridesharing and aerial sensing applications. Our ridesharing case study shows that Mobius can schedule more than 16,000 tasks across 40 customers and 200 vehicles in an online manner.