MANov 28, 2022
Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable DemandDaniel Garces, Sushmita Bhattacharya, Stephanie Gil et al.
We derive a learning framework to generate routing/pickup policies for a fleet of autonomous vehicles tasked with servicing stochastically appearing requests on a city map. We focus on policies that 1) give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests, 2) are non-myopic, and consider a-priori potential future requests, 3) can adapt to changes in the underlying demand distribution. Specifically, we are interested in policies that are adaptive to fluctuations of actual demand conditions in urban environments, such as on-peak vs. off-peak hours. We achieve this through a combination of (i) an online play algorithm that improves the performance of an offline-trained policy, and (ii) an offline approximation scheme that allows for adapting to changes in the underlying demand model. In particular, we achieve adaptivity of our learned policy to different demand distributions by quantifying a region of validity using the q-valid radius of a Wasserstein Ambiguity Set. We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region. In this case, we propose to use an offline architecture, trained on a historical demand model that is closer to the current demand in terms of Wasserstein distance. We learn routing and pickup policies over real taxicab requests in San Francisco with high variability between on-peak and off-peak hours, demonstrating the ability of our method to adapt to real fluctuation in demand distributions. Our numerical results demonstrate that our method outperforms alternative rollout-based reinforcement learning schemes, as well as other classical methods from operations research.
AIJul 5, 2023
Surge Routing: Event-informed Multiagent Reinforcement Learning for Autonomous RideshareDaniel Garces, Stephanie Gil
Large events such as conferences, concerts and sports games, often cause surges in demand for ride services that are not captured in average demand patterns, posing unique challenges for routing algorithms. We propose a learning framework for an autonomous fleet of taxis that leverages event data from the internet to predict demand surges and generate cooperative routing policies. We achieve this through a combination of two major components: (i) a demand prediction framework that uses textual event information in the form of events' descriptions and reviews to predict event-driven demand surges over street intersections, and (ii) a scalable multiagent reinforcement learning framework that leverages demand predictions and uses one-agent-at-a-time rollout combined with limited sampling certainty equivalence to learn intersection-level routing policies. For our experimental results we consider real NYC ride share data for the year 2022 and information for more than 2000 events across 300 unique venues in Manhattan. We test our approach with a fleet of 100 taxis on a map with 2235 street intersections. Our experimental results demonstrate that our method learns routing policies that reduce wait time overhead per serviced request by 25% to 75%, while picking up 1% to 4% more requests than other model-based RL frameworks and classical methods in operations research.
12.5ROMay 17
Motion-Uncertainty-Aware Next-Best-View Planning for Moving Object ReconstructionKaren Li, Mattia Mantovani, Robert J. Wood et al.
Active 3D reconstruction of moving objects requires selecting informative viewpoints while accounting for object motion uncertainty during the decision-to-execution delay. Existing methods address only parts of this problem: next-best-view (NBV) planners for object reconstruction typically optimize surface coverage but assume static objects, while motion-aware active perception for moving targets accounts for target motion but prioritizes tracking or visibility over reconstruction coverage. This work presents a motion-uncertainty-aware NBV framework for reconstructing an unknown rigid object undergoing planar motion, using noisy planar position measurements of the object and depth observations from a mobile robot. The key idea is to evaluate each candidate viewpoint by its expected observation quality over plausible future object states induced by motion and measurement uncertainty, rather than at a single predicted object pose. To obtain this predictive belief, a fixed-lag Gaussian Process smoother estimates and predicts the object state from noisy position measurements. The resulting belief is used to generate candidate viewpoints around the predicted object location, filter them by reachability, and estimate their expected coverage-driven scores. Simulation and real-world experiments demonstrate improved reconstruction completeness over non-predictive NBV and prediction-only tracking methods, bridging coverage-driven active reconstruction and prediction-driven tracking.
MANov 2, 2023
Approximate Multiagent Reinforcement Learning for On-Demand Urban Mobility Problem on a Large Map (extended version)Daniel Garces, Sushmita Bhattacharya, Dimitri Bertsekas et al.
In this paper, we focus on the autonomous multiagent taxi routing problem for a large urban environment where the location and number of future ride requests are unknown a-priori, but can be estimated by an empirical distribution. Recent theory has shown that a rollout algorithm with a stable base policy produces a near-optimal stable policy. In the routing setting, a policy is stable if its execution keeps the number of outstanding requests uniformly bounded over time. Although, rollout-based approaches are well-suited for learning cooperative multiagent policies with considerations for future demand, applying such methods to a large urban environment can be computationally expensive due to the large number of taxis required for stability. In this paper, we aim to address the computational bottleneck of multiagent rollout by proposing an approximate multiagent rollout-based two phase algorithm that reduces computational costs, while still achieving a stable near-optimal policy. Our approach partitions the graph into sectors based on the predicted demand and the maximum number of taxis that can run sequentially given the user's computational resources. The algorithm then applies instantaneous assignment (IA) for re-balancing taxis across sectors and a sector-wide multiagent rollout algorithm that is executed in parallel for each sector. We provide two main theoretical results: 1) characterize the number of taxis $m$ that is sufficient for IA to be stable; 2) derive a necessary condition on $m$ to maintain stability for IA as time goes to infinity. Our numerical results show that our approach achieves stability for an $m$ that satisfies the theoretical conditions. We also empirically demonstrate that our proposed two phase algorithm has equivalent performance to the one-at-a-time rollout over the entire map, but with significantly lower runtimes.
LGJun 25, 2025Code
Characterization and Mitigation of Training Instabilities in Microscaling FormatsHuangyuan Su, Mujin Kwun, Stephanie Gil et al.
Training large language models is an expensive, compute-bound process that must be repeated as models scale, algorithms improve, and new data is collected. To address this, next-generation hardware accelerators increasingly support lower-precision arithmetic formats, such as the Microscaling (MX) formats introduced in NVIDIA's Blackwell architecture. These formats use a shared scale within blocks of parameters to extend representable range and perform forward/backward GEMM operations in reduced precision for efficiency gains. In this work, we investigate the challenges and viability of block-scaled precision formats during model training. Across nearly one thousand language models trained from scratch -- spanning compute budgets from $2 \times 10^{17}$ to $4.8 \times 10^{19}$ FLOPs and sweeping over a broad range of weight-activation precision combinations -- we consistently observe that training in MX formats exhibits sharp, stochastic instabilities in the loss, particularly at larger compute scales. To explain this phenomenon, we conduct controlled experiments and ablations on a smaller proxy model that exhibits similar behavior as the language model, sweeping across architectural settings, hyperparameters, and precision formats. These experiments motivate a simple model in which multiplicative gradient bias introduced by the quantization of layer-norm affine parameters and a small fraction of activations can trigger runaway divergence. Through \emph{in situ} intervention experiments on our proxy model, we demonstrate that instabilities can be averted or delayed by modifying precision schemes mid-training. Guided by these findings, we evaluate stabilization strategies in the LLM setting and show that certain hybrid configurations recover performance competitive with full-precision training. We release our code at https://github.com/Hither1/systems-scaling.
ROSep 24, 2021Code
Toolbox Release: A WiFi-Based Relative Bearing Sensor for RoboticsNinad Jadhav, Weiying Wang, Diana Zhang et al.
This paper presents the WiFi-Sensor-for-Robotics (WSR) toolbox, an open source C++ framework. It enables robots in a team to obtain relative bearing to each other, even in non-line-of-sight (NLOS) settings which is a very challenging problem in robotics. It does so by analyzing the phase of their communicated WiFi signals as the robots traverse the environment. This capability, based on the theory developed in our prior works, is made available for the first time as an opensource tool. It is motivated by the lack of easily deployable solutions that use robots' local resources (e.g WiFi) for sensing in NLOS. This has implications for localization, ad-hoc robot networks, and security in multi-robot teams, amongst others. The toolbox is designed for distributed and online deployment on robot platforms using commodity hardware and on-board sensors. We also release datasets demonstrating its performance in NLOS and line-of-sight (LOS) settings for a multi-robot localization usecase. Empirical results show that the bearing estimation from our toolbox achieves mean accuracy of 5.10 degrees. This leads to a median error of 0.5m and 0.9m for localization in LOS and NLOS settings respectively, in a hardware deployment in an indoor office environment.
CVApr 16, 2025
Interpreting the linear structure of vision-language model embedding spacesIsabel Papadimitriou, Huangyuan Su, Thomas Fel et al.
Vision-language models encode images and text in a joint space, minimizing the distance between corresponding image and text pairs. How are language and images organized in this joint space, and how do the models encode meaning and modality? To investigate this, we train and release sparse autoencoders (SAEs) on the embedding spaces of four vision-language models (CLIP, SigLIP, SigLIP2, and AIMv2). SAEs approximate model embeddings as sparse linear combinations of learned directions, or "concepts". We find that, compared to other methods of linear feature learning, SAEs are better at reconstructing the real embeddings, while also able to retain the most sparsity. Retraining SAEs with different seeds or different data diet leads to two findings: the rare, specific concepts captured by the SAEs are liable to change drastically, but we also show that commonly-activating concepts are remarkably stable across runs. Interestingly, while most concepts activate primarily for one modality, we find they are not merely encoding modality per se. Many are almost orthogonal to the subspace that defines modality, and the concept directions do not function as good modality classifiers, suggesting that they encode cross-modal semantics. To quantify this bridging behavior, we introduce the Bridge Score, a metric that identifies concept pairs which are both co-activated across aligned image-text inputs and geometrically aligned in the shared space. This reveals that even single-modality concepts can collaborate to support cross-modal integration. We release interactive demos of the SAEs for all models, allowing researchers to explore the organization of the concept spaces. Overall, our findings uncover a sparse linear structure within VLM embedding spaces that is shaped by modality, yet stitched together through latent bridges, offering new insight into how multimodal meaning is constructed.
LGOct 9, 2025
LOTION: Smoothing the Optimization Landscape for Quantized TrainingMujin Kwun, Depen Morwani, Chloe Huangyuan Su et al.
Optimizing neural networks for quantized objectives is fundamentally challenging because the quantizer is piece-wise constant, yielding zero gradients everywhere except at quantization thresholds where the derivative is undefined. Most existing methods deal with this issue by relaxing gradient computations with techniques like Straight Through Estimators (STE) and do not provide any guarantees of convergence. In this work, taking inspiration from Nesterov smoothing, we approximate the quantized loss surface with a continuous loss surface. In particular, we introduce LOTION, \textbf{L}ow-precision \textbf{O}ptimization via s\textbf{T}ochastic-no\textbf{I}se sm\textbf{O}othi\textbf{N}g, a principled smoothing framework that replaces the raw quantized loss with its expectation under unbiased randomized-rounding noise. In this framework, standard optimizers are guaranteed to converge to a local minimum of the loss surface. Moreover, when using noise derived from stochastic rounding, we show that the global minima of the original quantized loss are preserved. We empirically demonstrate that this method outperforms standard QAT on synthetic testbeds and on 150M- and 300M- parameter language models.
AIFeb 26, 2025
Data-Efficient Multi-Agent Spatial Planning with LLMsHuangyuan Su, Aaron Walsman, Daniel Garces et al.
In this project, our goal is to determine how to leverage the world-knowledge of pretrained large language models for efficient and robust learning in multiagent decision making. We examine this in a taxi routing and assignment problem where agents must decide how to best pick up passengers in order to minimize overall waiting time. While this problem is situated on a graphical road network, we show that with the proper prompting zero-shot performance is quite strong on this task. Furthermore, with limited fine-tuning along with the one-at-a-time rollout algorithm for look ahead, LLMs can out-compete existing approaches with 50 times fewer environmental interactions. We also explore the benefits of various linguistic prompting approaches and show that including certain easy-to-compute information in the prompt significantly improves performance. Finally, we highlight the LLM's built-in semantic understanding, showing its ability to adapt to environmental factors through simple prompts.
ROAug 20, 2025
Decentralized Vision-Based Autonomous Aerial Wildlife MonitoringMakram Chahine, William Yang, Alaa Maalouf et al.
Wildlife field operations demand efficient parallel deployment methods to identify and interact with specific individuals, enabling simultaneous collective behavioral analysis, and health and safety interventions. Previous robotics solutions approach the problem from the herd perspective, or are manually operated and limited in scale. We propose a decentralized vision-based multi-quadrotor system for wildlife monitoring that is scalable, low-bandwidth, and sensor-minimal (single onboard RGB camera). Our approach enables robust identification and tracking of large species in their natural habitat. We develop novel vision-based coordination and tracking algorithms designed for dynamic, unstructured environments without reliance on centralized communication or control. We validate our system through real-world experiments, demonstrating reliable deployment in diverse field conditions.
ROMar 31, 2025
Pro-Routing: Proactive Routing of Autonomous Multi-Capacity Robots for Pickup-and-Delivery TasksDaniel Garces, Stephanie Gil
We consider a multi-robot setting, where we have a fleet of multi-capacity autonomous robots that must service spatially distributed pickup-and-delivery requests with fixed maximum wait times. Requests can be either scheduled ahead of time or they can enter the system in real-time. In this setting, stability for a routing policy is defined as the cost of the policy being uniformly bounded over time. Most previous work either solve the problem offline to theoretically maintain stability or they consider dynamically arriving requests at the expense of the theoretical guarantees on stability. In this paper, we aim to bridge this gap by proposing a novel proactive rollout-based routing framework that adapts to real-time demand while still provably maintaining the stability of the learned routing policy. We derive provable stability guarantees for our method by proposing a fleet sizing algorithm that obtains a sufficiently large fleet that ensures stability by construction. To validate our theoretical results, we consider a case study on real ride requests for Harvard's evening Van System. We also evaluate the performance of our framework using the currently deployed smaller fleet size. In this smaller setup, we compare against the currently deployed routing algorithm, greedy heuristics, and Monte-Carlo-Tree-Search-based algorithms. Our empirical results show that our framework maintains stability when we use the sufficiently large fleet size found in our theoretical results. For the smaller currently deployed fleet size, our method services 6% more requests than the closest baseline while reducing median passenger wait times by 33%.
SPOct 3, 2021
Cloud-Cluster Architecture for Detection in Intermittently Connected Sensor NetworksMichal Yemini, Stephanie Gil, Andrea J. Goldsmith
We consider a centralized detection problem where sensors experience noisy measurements and intermittent connectivity to a centralized fusion center. The sensors collaborate locally within predefined sensor clusters and fuse their noisy sensor data to reach a common local estimate of the detected event in each cluster. The connectivity of each sensor cluster is intermittent and depends on the available communication opportunities of the sensors to the fusion center. Upon receiving the estimates from all the connected sensor clusters the fusion center fuses the received estimates to make a final determination regarding the occurrence of the event across the deployment area. We refer to this hybrid communication scheme as a \emph{cloud-cluster} architecture. We propose a method for optimizing the decision rule for each cluster and analyzing the expected detection performance resulting from our hybrid scheme. Our method is tractable and addresses the high computational complexity caused by heterogeneous sensors' and clusters' detection quality, heterogeneity in their communication opportunities, and non-convexity of the loss function. Our analysis shows that clustering the sensors provides resilience to noise in the case of low sensor communication probability with the cloud. For larger clusters, a steep improvement in detection performance is possible even for a low communication probability by using our cloud-cluster architecture.
OCMar 9, 2021
Characterizing Trust and Resilience in Distributed Consensus for Cyberphysical SystemsMichal Yemini, Angelia Nedić, Andrea Goldsmith et al.
This work considers the problem of resilient consensus where stochastic values of trust between agents are available. Specifically, we derive a unified mathematical framework to characterize convergence, deviation of the consensus from the true consensus value, and expected convergence rate, when there exists additional information of trust between agents. We show that under certain conditions on the stochastic trust values and consensus protocol: 1) almost sure convergence to a common limit value is possible even when malicious agents constitute more than half of the network connectivity, 2) the deviation of the converged limit, from the case where there is no attack, i.e., the true consensus value, can be bounded with probability that approaches 1 exponentially, and 3) correct classification of malicious and legitimate agents can be attained in finite time almost surely. Further, the expected convergence rate decays exponentially as a function of the quality of the trust observations between agents.
RODec 11, 2020
Crowd Vetting: Rejecting Adversaries via Collaboration--with Application to Multi-Robot FlockingFrederik Mallmann-Trenn, Matthew Cavorsi, Stephanie Gil
We characterize the advantage of using a robot's neighborhood to find and eliminate adversarial robots in the presence of a Sybil attack. We show that by leveraging the opinions of its neighbors on the trustworthiness of transmitted data, robots can detect adversaries with high probability. We characterize a number of communication rounds required to achieve this result to be a function of the communication quality and the proportion of legitimate to malicious robots. This result enables increased resiliency of many multi-robot algorithms. Because our results are finite time and not asymptotic, they are particularly well-suited for problems with a time critical nature. We develop two algorithms, \emph{FindSpoofedRobots} that determines trusted neighbors with high probability, and \emph{FindResilientAdjacencyMatrix} that enables distributed computation of graph properties in an adversarial setting. We apply our methods to a flocking problem where a team of robots must track a moving target in the presence of adversarial robots. We show that by using our algorithms, the team of robots are able to maintain tracking ability of the dynamic target.
RODec 8, 2020
A wireless signal-based sensing framework for roboticsNinad Jadhav, Weiying Wang, Diana Zhang et al.
In this paper we develop the analytical framework for a novel Wireless signal-based Sensing capability for Robotics (WSR) by leveraging robots' mobility. It allows robots to primarily measure relative direction, or Angle-of-Arrival (AOA), to other robots, while operating in non-line-of-sight unmapped environments and without requiring external infrastructure. We do so by capturing all of the paths that a wireless signal traverses as it travels from a transmitting to a receiving robot in the team, which we term as an AOA profile. The key intuition behind our approach is to enable a robot to emulate antenna arrays as it moves freely in 2D and 3D space. The small differences in the phase of the wireless signals are thus processed with knowledge of robots' local displacement to obtain the profile, via a method akin to Synthetic Aperture Radar (SAR). The main contribution of this work is the development of i) a framework to accommodate arbitrary 2D and 3D motion, as well as continuous mobility of both signal transmitting and receiving robots, while computing AOA profiles between them and ii) a Cramer-Rao Bound analysis, based on antenna array theory, that provides a lower bound on the variance in AOA estimation as a function of the geometry of robot motion. We show that allowing robots to use their full mobility in 3D space while performing SAR, results in more accurate AOA profiles and thus better AOA estimation. All analytical developments are substantiated by extensive simulation and hardware experiments on air/ground robot platforms using 5 GHz WiFi. Our experimental results bolster our analytical findings, demonstrating that 3D motion provides enhanced and consistent accuracy, with total AOA error of less than 10 degree for 95% of trials. We also analytically characterize the impact of displacement estimation errors on the measured AOA.
RONov 9, 2020
Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair ProblemsSushmita Bhattacharya, Siva Kailas, Sahil Badyal et al.
In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, partial state observations, and a multiagent structure. We discuss and compare algorithms that simultaneously or sequentially optimize the agents' controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. Our methods specifically address the computational challenges of partially observable multiagent problems. In particular: 1) We consider rollout algorithms that dramatically reduce required computation while preserving the key cost improvement property of the standard rollout method. The per-step computational requirements for our methods are on the order of $O(Cm)$ as compared with $O(C^m)$ for standard rollout, where $C$ is the maximum cardinality of the constraint set for the control component of each agent, and $m$ is the number of agents. 2) We show that our methods can be applied to challenging problems with a graph structure, including a class of robot repair problems whereby multiple robots collaboratively inspect and repair a system under partial information. 3) We provide a simulation study that compares our methods with existing methods, and demonstrate that our methods can handle larger and more complex partially observable multiagent problems (state space size $10^{37}$ and control space size $10^{7}$, respectively). Finally, we incorporate our multiagent rollout algorithms as building blocks in an approximate policy iteration scheme, where successive rollout policies are approximated by using neural network classifiers. While this scheme requires a strictly off-line implementation, it works well in our computational experiments and produces additional significant performance improvement over the single online rollout iteration method.
ROFeb 11, 2020
Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair ProblemsSushmita Bhattacharya, Sahil Badyal, Thomas Wheeler et al.
In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.
ROJul 12, 2019
Active Rendezvous for Multi-Robot Pose Graph Optimization using Sensing over Wi-FiWeiying Wang, Ninad Jadhav, Paul Vohs et al.
We present a novel framework for collaboration amongst a team of robots performing Pose Graph Optimization (PGO) that addresses two important challenges for multi-robot SLAM: i) that of enabling information exchange "on-demand" via Active Rendezvous without using a map or the robot's location, and ii) that of rejecting outlying measurements. Our key insight is to exploit relative position data present in the communication channel between robots to improve groundtruth accuracy of PGO. We develop an algorithmic and experimental framework for integrating Channel State Information (CSI) with multi-robot PGO; it is distributed, and applicable in low-lighting or featureless environments where traditional sensors often fail. We present extensive experimental results on actual robots and observe that using Active Rendezvous results in a 64% reduction in ground truth pose error and that using CSI observations to aid outlier rejection reduces ground truth pose error by 32%. These results show the potential of integrating communication as a novel sensor for SLAM.