MLLGMar 19, 2022

Thompson Sampling on Asymmetric $α$-Stable Bandits

arXiv:2203.10214v2h-index: 29
AI Analysis

This work addresses the exploration-exploitation dilemma in reinforcement learning for domains like finance and wireless communications, but it appears incremental as it extends Thompson Sampling to a specific distribution type.

The paper tackles the multi-armed bandit problem with rewards following unknown asymmetric α-stable distributions, applying Thompson Sampling to model financial and wireless data, achieving results that demonstrate its effectiveness in balancing exploration and exploitation.

In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric $α$-stable distributions and explore their applications in modelling financial and wireless data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes