OCNov 8, 2012
Nonuniform Coverage Control on the LineNaomi Ehrich Leonard, Alex Olshevsky
This paper investigates control laws allowing mobile, autonomous agents to optimally position themselves on the line for distributed sensing in a nonuniform field. We show that a simple static control law, based only on local measurements of the field by each agent, drives the agents close to the optimal positions after the agents execute in parallel a number of sensing/movement/computation rounds that is essentially quadratic in the number of agents. Further, we exhibit a dynamic control law which, under slightly stronger assumptions on the capabilities and knowledge of each agent, drives the agents close to the optimal positions after the agents execute in parallel a number of sensing/communication/computation/movement rounds that is essentially linear in the number of agents. Crucially, both algorithms are fully distributed and robust to unpredictable loss and addition of agents.
OCJun 20, 2011
Rearranging trees for robust consensusGeorge Forrest Young, Luca Scardovi, Naomi Ehrich Leonard
In this paper, we use the H2 norm associated with a communication graph to characterize the robustness of consensus to noise. In particular, we restrict our attention to trees and by systematic attention to the effect of local changes in topology, we derive a partial ordering for undirected trees according to the H2 norm. Our approach for undirected trees provides a constructive method for deriving an ordering for directed trees. Further, our approach suggests a decentralized manner in which trees can be rearranged in order to improve their robustness.
SYOct 16, 2012
Node Classification in Networks of Stochastic Evidence AccumulatorsIoannis Poulakakis, Luca Scardovi, Naomi Ehrich Leonard
This paper considers a network of stochastic evidence accumulators, each represented by a drift-diffusion model accruing evidence towards a decision in continuous time by observing a noisy signal and by exchanging information with other units according to a fixed communication graph. We bring into focus the relationship between the location of each unit in the communication graph and its certainty as measured by the inverse of the variance of its state. We show that node classification according to degree distributions or geodesic distances cannot faithfully capture node ranking in terms of certainty. Instead, all possible paths connecting each unit with the rest in the network must be incorporated. We make this precise by proving that node classification according to information centrality provides a rank ordering with respect to node certainty, thereby affording a direct interpretation of the certainty level of each unit in terms of the structural properties of the underlying communication graph.
SOC-PHDec 18, 2018
Social decision-making driven by artistic explore-exploit tensionKayhan Ozcimder, Biswadip Dey, Alessio Franci et al.
We studied social decision-making in the rule-based improvisational dance $There$ $Might$ $Be$ $Others$, where dancers make in-the-moment compositional choices. Rehearsals provided a natural test-bed with communication restricted to non-verbal cues. We observed a key artistic explore-exploit tension in which the dancers switched between exploitation of existing artistic opportunities and riskier exploration of new ones. We investigated how the rules influenced the dynamics using rehearsals together with a model generalized from evolutionary dynamics. We tuned the rules to heighten the tension and modeled nonlinear fitness and feedback dynamics for mutation rate to capture the observed temporal phasing of the dancers' exploration-versus-exploitation. Using bifurcation analysis, we identified key controls of the tension and showed how they could shape the decision-making dynamics of the model much like turning a "dial" in the instructions to the dancers could shape the dance. The investigation became an integral part of the development of the dance.
20.6SYMar 31
A Continuous-Time and State-Space Relaxation of the Linear Threshold Model with Nonlinear Opinion DynamicsIan Xul Belaustegui, Himani Sinhmar, Ling-Wei Kong et al.
The Linear Threshold Model (LTM) is widely used to study the propagation of collective behaviors as complex contagions. However, its dependence on discrete states and timesteps restricts its ability to capture the multiple time-scales inherent in decision-making, as well as the effects of subthreshold signaling. To address these limitations, we introduce a continuous-time and state-space relaxation of the LTM based on the Nonlinear Opinion Dynamics (NOD) framework. By replacing the discontinuous step-function thresholds of the LTM with the smooth bifurcations of the NOD model, we map discrete cascade processes to the continuous flow of a dynamical system. We prove that, under appropriate parameter choices, activation in the discrete LTM guarantees activation in the continuous NOD relaxation for any given seed set. We establish computable conditions for equivalence: by sufficiently bounding the social coupling parameter, the continuous NOD cascades exactly recover the cascades of the discrete LTM. We then illustrate how this NOD relaxation provides a richer analytical framework than the LTM, allowing for the exploration of cascades driven by strictly subthreshold inputs and the role of temporally distributed signals.
CVAug 24, 2023
Learning to Predict 3D Rotational Dynamics from Images of a Rigid Body with Unknown Mass DistributionJustice Mason, Christine Allen-Blanchette, Nicholas Zolman et al.
In many real-world settings, image observations of freely rotating 3D rigid bodies may be available when low-dimensional measurements are not. However, the high-dimensionality of image data precludes the use of classical estimation techniques to learn the dynamics. The usefulness of standard deep learning methods is also limited, because an image of a rigid body reveals nothing about the distribution of mass inside the body, which, together with initial angular velocity, is what determines how the body will rotate. We present a physics-based neural network model to estimate and predict 3D rotational dynamics from image sequences. We achieve this using a multi-stage prediction pipeline that maps individual images to a latent representation homeomorphic to $\mathbf{SO}(3)$, computes angular velocities from latent pairs, and predicts future latent states using the Hamiltonian equations of motion. We demonstrate the efficacy of our approach on new rotating rigid-body datasets of sequences of synthetic images of rotating objects, including cubes, prisms and satellites, with unknown uniform and non-uniform mass distributions. Our model outperforms competing baselines on our datasets, producing better qualitative predictions and reducing the error observed for the state-of-the-art Hamiltonian Generative Network by a factor of 2.
ROFeb 21, 2024
Blending Data-Driven Priors in Dynamic GamesJustin Lidard, Haimin Hu, Asher Hancock et al.
As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, humans tend to deviate from the decisions prescribed by these models, and their behavior is better approximated under a noisy-rational paradigm. In this work, we investigate a principled methodology to blend a data-driven reference policy with an optimization-based game-theoretic policy. We formulate KLGame, an algorithm for solving non-cooperative dynamic game with Kullback-Leibler (KL) regularization with respect to a general, stochastic, and possibly multi-modal reference policy. Our method incorporates, for each decision maker, a tunable parameter that permits modulation between task-driven and data-driven behaviors. We propose an efficient algorithm for computing multi-modal approximate feedback Nash equilibrium strategies of KLGame in real time. Through a series of simulated and real-world autonomous driving scenarios, we demonstrate that KLGame policies can more effectively incorporate guidance from the reference policy and account for noisily-rational human behaviors versus non-regularized baselines. Website with additional information, videos, and code: https://kl-games.github.io/.
LGJun 20, 2024
Behavior-Inspired Neural Networks for Relational InferenceYulong Yang, Bowen Feng, Keqin Wang et al.
From pedestrians to Kuramoto oscillators, interactions between agents govern how dynamical systems evolve in space and time. Discovering how these agents relate to each other has the potential to improve our understanding of the often complex dynamics that underlie these systems. Recent works learn to categorize relationships between agents based on observations of their physical behavior. These approaches model relationship categories as outcomes of a categorical distribution which is limiting and contrary to real-world systems, where relationship categories often intermingle and interact. In this work, we introduce a level of abstraction between the observable behavior of agents and the latent categories that determine their behavior. To do this, we learn a mapping from agent observations to agent preferences for a set of latent categories. The learned preferences and inter-agent proximity are integrated in a nonlinear opinion dynamics model, which allows us to naturally identify mutually exclusive categories, predict an agent's evolution in time, and control an agent's behavior. Through extensive experiments, we demonstrate the utility of our model for learning interpretable categories, and the efficacy of our model for long-horizon trajectory prediction.
MLNov 24, 2021
One More Step Towards Reality: Cooperative Bandits with Imperfect CommunicationUdari Madhushani, Abhimanyu Dubey, Naomi Ehrich Leonard et al.
The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative bandit learning under three typical real-world communication scenarios, namely, (a) message-passing over stochastic time-varying networks, (b) instantaneous reward-sharing over a network with random delays, and (c) message-passing with adversarially corrupted rewards, including byzantine communication. For each of these environments, we propose decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret as well. Furthermore, in the setting with perfect communication, we present an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies. Finally, we present tight network-dependent minimax lower bounds on the group regret. Our proposed algorithms are straightforward to implement and obtain competitive empirical performance.
LGOct 14, 2021
Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized CommunicationJustin Lidard, Udari Madhushani, Naomi Ehrich Leonard
A challenge in reinforcement learning (RL) is minimizing the cost of sampling associated with exploration. Distributed exploration reduces sampling complexity in multi-agent RL (MARL). We investigate the benefits to performance in MARL when exploration is fully decentralized. Specifically, we consider a class of online, episodic, tabular $Q$-learning problems under time-varying reward and transition dynamics, in which agents can communicate in a decentralized manner.We show that group performance, as measured by the bound on regret, can be significantly improved through communication when each agent uses a decentralized message-passing protocol, even when limited to sending information up to its $γ$-hop neighbors. We prove regret and sample complexity bounds that depend on the number of agents, communication network structure and $γ.$ We show that incorporating more agents and more information sharing into the group learning scheme speeds up convergence to the optimal policy. Numerical simulations illustrate our results and validate our theoretical claims.
MLNov 16, 2020
Distributed Bandits: Probabilistic Communication on $d$-regular GraphsUdari Madhushani, Naomi Ehrich Leonard
We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph. Every edge in the graph has probabilistic weight $p$ to account for the ($1\!-\!p$) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability $p$. We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret in this probabilistic communication setting. We provide theoretical guarantees that our algorithm outperforms state-of-the-art algorithms. We illustrate our results and validate the theoretical claims using numerical simulations.
LGNov 11, 2020
On Using Hamiltonian Monte Carlo Sampling for Reinforcement Learning Problems in High-dimensionUdari Madhushani, Biswadip Dey, Naomi Ehrich Leonard et al.
Value function based reinforcement learning (RL) algorithms, for example, $Q$-learning, learn optimal policies from datasets of actions, rewards, and state transitions. However, when the underlying state transition dynamics are stochastic and evolve on a high-dimensional space, generating independent and identically distributed (IID) data samples for creating these datasets poses a significant challenge due to the intractability of the associated normalizing integral. In these scenarios, Hamiltonian Monte Carlo (HMC) sampling offers a computationally tractable way to generate data for training RL algorithms. In this paper, we introduce a framework, called \textit{Hamiltonian $Q$-Learning}, that demonstrates, both theoretically and empirically, that $Q$ values can be learned from a dataset generated by HMC samples of actions, rewards, and state transitions. Furthermore, to exploit the underlying low-rank structure of the $Q$ function, Hamiltonian $Q$-Learning uses a matrix completion algorithm for reconstructing the updated $Q$ function from $Q$ value updates over a much smaller subset of state-action pairs. Thus, by providing an efficient way to apply $Q$-learning in stochastic, high-dimensional settings, the proposed approach broadens the scope of RL algorithms for real-world applications.
LGOct 24, 2020
LagNetViP: A Lagrangian Neural Network for Video PredictionChristine Allen-Blanchette, Sushant Veer, Anirudha Majumdar et al.
The dominant paradigms for video prediction rely on opaque transition models where neither the equations of motion nor the underlying physical quantities of the system are easily inferred. The equations of motion, as defined by Newton's second law, describe the time evolution of a physical system state and can therefore be applied toward the determination of future system states. In this paper, we introduce a video prediction model where the equations of motion are explicitly constructed from learned representations of the underlying physical quantities. To achieve this, we simultaneously learn a low-dimensional state representation and system Lagrangian. The kinetic and potential energy terms of the Lagrangian are distinctly modelled and the low-dimensional equations of motion are explicitly constructed using the Euler-Lagrange equations. We demonstrate the efficacy of this approach for video prediction on image sequences rendered in modified OpenAI gym Pendulum-v0 and Acrobot environments.
LGJul 3, 2020
Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and ControlYaofeng Desmond Zhong, Naomi Ehrich Leonard
Recent approaches for modelling dynamics of physical systems with neural networks enforce Lagrangian or Hamiltonian structure to improve prediction and generalization. However, when coordinates are embedded in high-dimensional data such as images, these approaches either lose interpretability or can only be applied to one particular example. We introduce a new unsupervised neural network model that learns Lagrangian dynamics from images, with interpretability that benefits prediction and control. The model infers Lagrangian dynamics on generalized coordinates that are simultaneously learned with a coordinate-aware variational autoencoder (VAE). The VAE is designed to account for the geometry of physical systems composed of multiple rigid bodies in the plane. By inferring interpretable Lagrangian dynamics, the model learns physical system properties, such as kinetic and potential energy, which enables long-term prediction of dynamics in the image space and synthesis of energy-based controllers.
LGApr 13, 2020
Distributed Learning: Sequential Decision Making in Resource-Constrained EnvironmentsUdari Madhushani, Naomi Ehrich Leonard
We study cost-effective communication strategies that can be used to improve the performance of distributed learning systems in resource-constrained environments. For distributed learning in sequential decision making, we propose a new cost-effective partial communication protocol. We illustrate that with this protocol the group obtains the same order of performance that it obtains with full communication. Moreover, we prove that under the proposed partial communication protocol the communication cost is $O(\log T)$, where $T$ is the time horizon of the decision-making process. This improves significantly on protocols with full communication, which incur a communication cost that is $O(T)$. We validate our theoretical results using numerical simulations.
OCApr 8, 2020
A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit ProblemUdari Madhushani, Naomi Ehrich Leonard
We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it receives a constant observation regret. We design a sampling algorithm and an observation protocol for each agent to maximize its own expected cumulative reward through minimizing expected cumulative sampling regret and expected cumulative observation regret. For our proposed protocol, we prove that total cumulative regret is logarithmically bounded. We verify the accuracy of analytical bounds using numerical simulations.
OCMar 3, 2020
Distributed Cooperative Decision Making in Multi-agent Multi-armed BanditsPeter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard
We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their estimates over a fixed communication graph. We consider an unconstrained reward model in which two or more agents can choose the same arm and collect independent rewards. And we consider a constrained reward model in which agents that choose the same arm at the same time receive no reward. We design a dynamic, consensus-based, distributed estimation algorithm for cooperative estimation of mean rewards at each arm. We leverage the estimates from this algorithm to develop two distributed algorithms: coop-UCB2 and coop-UCB2-selective-learning, for the unconstrained and constrained reward models, respectively. We show that both algorithms achieve group performance close to the performance of a centralized fusion center. Further, we investigate the influence of the communication graph structure on performance. We propose a novel graph explore-exploit index that predicts the relative performance of groups in terms of the communication graph, and we propose a novel nodal explore-exploit centrality index that predicts the relative performance of agents in terms of the agent locations in the communication graph.
OCMay 21, 2019
Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit ProblemUdari Madhushani, Naomi Ehrich Leonard
We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors. Neighbors are defined by a network graph with heterogeneous and stochastic interconnections. These interactions are determined by the sociability of each agent, which corresponds to the probability that the agent observes its neighbors. We design an algorithm for each agent to maximize its own expected cumulative reward and prove performance bounds that depend on the sociability of the agents and the network structure. We use the bounds to predict the rank ordering of agents according to their performance and verify the accuracy analytically and computationally.
SYJul 3, 2017
Cluster synchronization of diffusively-coupled nonlinear systems: A contraction based approachZahra Aminzare, Biswadip Dey, Elizabeth N. Davison et al.
Finding the conditions that foster synchronization in networked oscillatory systems is critical to understanding a wide range of biological and mechanical systems. However, the conditions proved in the literature for synchronization in nonlinear systems with linear coupling, such as has been used to model neuronal networks, are in general not strict enough to accurately determine the system behavior. We leverage contraction theory to derive new sufficient conditions for cluster synchronization in terms of the network structure, for a network where the intrinsic nonlinear dynamics of each node may differ. Our result requires that network connections satisfy a cluster-input-equivalence condition, and we explore the influence of this requirement on network dynamics. For application to networks of nodes with neuronal spiking dynamics, we show that our new sufficient condition is tighter than those found in previous analyses which used nonsmooth Lyapunov functions. Improving the analytical conditions for when cluster synchronization will occur based on network configuration is a significant step toward facilitating understanding and control of complex oscillatory systems.
SYJun 2, 2016
Distributed Cooperative Decision-Making in Multiarmed Bandits: Frequentist and Bayesian AlgorithmsPeter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard
We study distributed cooperative decision-making under the explore-exploit tradeoff in the multiarmed bandit (MAB) problem. We extend the state-of-the-art frequentist and Bayesian algorithms for single-agent MAB problems to cooperative distributed algorithms for multi-agent MAB problems in which agents communicate according to a fixed network graph. We rely on a running consensus algorithm for each agent's estimation of mean rewards from its own rewards and the estimated rewards of its neighbors. We prove the performance of these algorithms and show that they asymptotically recover the performance of a centralized agent. Further, we rigorously characterize the influence of the communication graph structure on the decision-making performance of the group.
LGDec 23, 2015
Satisficing in multi-armed bandit problemsPaul Reverdy, Vaibhav Srivastava, Naomi Ehrich Leonard
Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty. We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold. We show that these new problems are equivalent to various standard multi-armed bandit problems with maximizing objectives and use the equivalence to find bounds on performance. The different objectives can result in qualitatively different behavior; for example, agents explore their options continually in one case and only a finite number of times in another. For the case of Gaussian rewards we show an additional equivalence between the two sets of satisficing objectives that allows algorithms developed for one set to be applied to the other. We then develop variants of the Upper Credible Limit (UCL) algorithm that solve the problems with satisficing objectives and show that these modified UCL algorithms achieve efficient satisficing performance.
SYDec 21, 2015
On Distributed Cooperative Decision-Making in Multiarmed BanditsPeter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard
We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises two interleaved distributed processes: (i) running consensus algorithms for estimation of rewards, and (ii) upper-confidence-bound-based heuristics for selection of arms. We rigorously analyze the performance of the cooperative UCB algorithm and characterize the influence of communication graph structure on the decision-making performance of the group.
OCJul 5, 2015
Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret AnalysisVaibhav Srivastava, Paul Reverdy, Naomi Ehrich Leonard
We consider the correlated multiarmed bandit (MAB) problem in which the rewards associated with each arm are modeled by a multivariate Gaussian random variable, and we investigate the influence of the assumptions in the Bayesian prior on the performance of the upper credible limit (UCL) algorithm and a new correlated UCL algorithm. We rigorously characterize the influence of accuracy, confidence, and correlation scale in the prior on the decision-making performance of the algorithms. Our results show how priors and correlation structure can be leveraged to improve performance.
OCJun 15, 2015
Joint Centrality Distinguishes Optimal Leaders in Noisy NetworksKatherine E. Fitch, Naomi Ehrich Leonard
We study the performance of a network of agents tasked with tracking an external unknown signal in the presence of stochastic disturbances and under the condition that only a limited subset of agents, known as leaders, can measure the signal directly. We investigate the optimal leader selection problem for a prescribed maximum number of leaders, where the optimal leader set minimizes total system error defined as steady-state variance about the external signal. In contrast to previously established greedy algorithms for optimal leader selection, our results rely on an expression of total system error in terms of properties of the underlying network graph. We demonstrate that the performance of any given set of leaders depends on their influence as determined by a new graph measure of centrality of a set. We define the $joint \; centrality$ of a set of nodes in a network graph such that a leader set with maximal joint centrality is an optimal leader set. In the case of a single leader, we prove that the optimal leader is the node with maximal information centrality. In the case of multiple leaders, we show that the nodes in the optimal leader set balance high information centrality with a coverage of the graph. For special cases of graphs, we solve explicitly for optimal leader sets. We illustrate with examples.
OCSep 11, 2012
Cooperative learning in multi-agent systems from intermittent measurementsNaomi Ehrich Leonard, Alex Olshevsky
Motivated by the problem of tracking a direction in a decentralized way, we consider the general problem of cooperative learning in multi-agent systems with time-varying connectivity and intermittent measurements. We propose a distributed learning protocol capable of learning an unknown vector $μ$ from noisy measurements made independently by autonomous nodes. Our protocol is completely distributed and able to cope with the time-varying, unpredictable, and noisy nature of inter-agent communication, and intermittent noisy measurements of $μ$. Our main result bounds the learning speed of our protocol in terms of the size and combinatorial features of the (time-varying) networks connecting the nodes.