D. H. S. Maithripala

h-index14

4papers

6citations

Novelty43%

AI Score19

Ranked #187,112 of 194,257 authors (top 96%)#39,538 in LG (top 98%)

4 Papers

1.2SYJan 23, 2018

Control Synthesis for an Underactuated Cable Suspended System Using Dynamic Decoupling

Siddharth H. Nair, Ravi N. Banavar, D. H. S. Maithripala

This article studies the dynamics and control of a novel underactuated system, wherein a plate suspended by cables and with a freely moving mass on top, whose other ends are attached to three quadrotors, is sought to be horizontally stabilized at a certain height, with the ball positioned at the center of mass of the plate. The freely moving mass introduces a 2-degree of underactuation into the system. The design proceeds through a decoupling of the quadrotors and the plate dynamics. Through a partial feedback linearization approach, the attitude of the plate and the translational height of the plate is initially controlled, while maintaining a bounded velocity along the $y$ and $x$ directions. These inputs are then synthesized through the quadrotors with a backstepping and timescale separation argument based on Tikhonov's theorem.

1.2LGMar 29, 2020

A Decentralized Policy with Logarithmic Regret for a Class of Multi-Agent Multi-Armed Bandit Problems with Option Unavailability Constraints and Stochastic Communication Protocols

Pathmanathan Pankayaraj, D. H. S. Maithripala, J. M. Berg

This paper considers a multi-armed bandit (MAB) problem in which multiple mobile agents receive rewards by sampling from a collection of spatially dispersed stochastic processes, called bandits. The goal is to formulate a decentralized policy for each agent, in order to maximize the total cumulative reward over all agents, subject to option availability and inter-agent communication constraints. The problem formulation is motivated by applications in which a team of autonomous mobile robots cooperates to accomplish an exploration and exploitation task in an uncertain environment. Bandit locations are represented by vertices of the spatial graph. At any time, an agent's option consist of sampling the bandit at its current location, or traveling along an edge of the spatial graph to a new bandit location. Communication constraints are described by a directed, non-stationary, stochastic communication graph. At any time, agents may receive data only from their communication graph in-neighbors. For the case of a single agent on a fully connected spatial graph, it is known that the expected regret for any optimal policy is necessarily bounded below by a function that grows as the logarithm of time. A class of policies called upper confidence bound (UCB) algorithms asymptotically achieve logarithmic regret for the classical MAB problem. In this paper, we propose a UCB-based decentralized motion and option selection policy and a non-stationary stochastic communication protocol that guarantee logarithmic regret. To our knowledge, this is the first such decentralized policy for non-fully connected spatial graphs with communication constraints. When the spatial graph is fully connected and the communication graph is stationary, our decentralized algorithm matches or exceeds the best reported prior results from the literature.

4.8LGOct 7, 2019

A Decentralized Communication Policy for Multi Agent Multi Armed Bandit Problems

Pathmanathan Pankayaraj, D. H. S. Maithripala

This paper proposes a novel policy for a group of agents to, individually as well as collectively, solve a multi armed bandit (MAB) problem. The policy relies solely on the information that an agent has obtained through sampling of the options on its own and through communication with neighbors. The option selection policy is based on an Upper Confidence Based (UCB) strategy while the communication strategy that is proposed forces agents to communicate with other agents who they believe are most likely to be exploring than exploiting. The overall strategy is shown to significantly outperform an independent Erdős-Rényi (ER) graph based random communication policy. The policy is shown to be cost effective in terms of communication and thus to be easily scalable to a large network of agents.

0.7LGOct 2, 2017

Asymptotic Allocation Rules for a Class of Dynamic Multi-armed Bandit Problems

T. W. U. Madhushani, D. H. S. Maithripala, N. E. Leonard

This paper presents a class of Dynamic Multi-Armed Bandit problems where the reward can be modeled as the noisy output of a time varying linear stochastic dynamic system that satisfies some boundedness constraints. The class allows many seemingly different problems with time varying option characteristics to be considered in a single framework. It also opens up the possibility of considering many new problems of practical importance. For instance it affords the simultaneous consideration of temporal option unavailabilities and the depen- dencies between options with time varying option characteristics in a seamless manner. We show that, for this class of problems, the combination of any Upper Confidence Bound type algorithm with any efficient reward estimator for the expected reward ensures the logarithmic bounding of the expected cumulative regret. We demonstrate the versatility of the approach by the explicit consideration of a new example of practical interest.