Kasper Johansson

LG
h-index3
3papers
11citations
Novelty53%
AI Score24

3 Papers

LGSep 20, 2022
Multi-armed Bandit Learning on a Graph

Tianpeng Zhang, Kasper Johansson, Na Li

The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively studied in the context of decision-making under uncertainty. In many real-world applications, such as robotic applications, selecting an arm corresponds to a physical action that constrains the choices of the next available arms (actions). Motivated by this, we study an extension of MAB called the graph bandit, where an agent travels over a graph to maximize the reward collected from different nodes. The graph defines the agent's freedom in selecting the next available nodes at each step. We assume the graph structure is fully available, but the reward distributions are unknown. Built upon an offline graph-based planning algorithm and the principle of optimism, we design a learning algorithm, G-UCB, that balances long-term exploration-exploitation using the principle of optimism. We show that our proposed algorithm achieves $O(\sqrt{|S|T\log(T)}+D|S|\log T)$ learning regret, where $|S|$ is the number of nodes and $D$ is the diameter of the graph, which matches the theoretical lower bound $Ω(\sqrt{|S|T})$ up to logarithmic factors. To our knowledge, this result is among the first tight regret bounds in non-episodic, un-discounted learning problems with known deterministic transitions. Numerical experiments confirm that our algorithm outperforms several benchmarks.

EMFeb 12, 2024
Finding Moving-Band Statistical Arbitrages via Convex-Concave Optimization

Kasper Johansson, Thomas Schmelzer, Stephen Boyd

We propose a new method for finding statistical arbitrages that can contain more assets than just the traditional pair. We formulate the problem as seeking a portfolio with the highest volatility, subject to its price remaining in a band and a leverage limit. This optimization problem is not convex, but can be approximately solved using the convex-concave procedure, a specific sequential convex programming method. We show how the method generalizes to finding moving-band statistical arbitrages, where the price band midpoint varies over time.

ROOct 3, 2021
Mixed Observable RRT: Multi-Agent Mission-Planning in Partially Observable Environments

Kasper Johansson, Ugo Rosolia, Wyatt Ubellacker et al.

This paper considers centralized mission-planning for a heterogeneous multi-agent system with the aim of locating a hidden target. We propose a mixed observable setting, consisting of a fully observable state-space and a partially observable environment, using a hidden Markov model. First, we construct rapidly exploring random trees (RRTs) to introduce the mixed observable RRT for finding plausible mission plans giving way-points for each agent. Leveraging this construction, we present a path-selection strategy based on a dynamic programming approach, which accounts for the uncertainty from partial observations and minimizes the expected cost. Finally, we combine the high-level plan with model predictive control algorithms to evaluate the approach on an experimental setup consisting of a quadruped robot and a drone. It is shown that agents are able to make intelligent decisions to explore the area efficiently and to locate the target through collaborative actions.