LGJun 28, 2023
Pure exploration in multi-armed bandits with low rank structure using oblivious samplerYaxiong Liu, Atsuyoshi Nakamura, Kohei Hatano et al.
In this paper, we consider the low rank structure of the reward sequence of the pure exploration problems. Firstly, we propose the separated setting in pure exploration problem, where the exploration strategy cannot receive the feedback of its explorations. Due to this separation, it requires that the exploration strategy to sample the arms obliviously. By involving the kernel information of the reward vectors, we provide efficient algorithms for both time-varying and fixed cases with regret bound $O(d\sqrt{(\ln N)/n})$. Then, we show the lower bound to the pure exploration in multi-armed bandits with low rank sequence. There is an $O(\sqrt{\ln N})$ gap between our upper bound and the lower bound.
OCDec 10, 2020
A generalised log-determinant regularizer for online semi-definite programming and its applicationsYaxiong Liu, Ken-ichiro Moridomi, Kohei Hatano et al.
We consider a variant of online semi-definite programming problem (OSDP): The decision space consists of semi-definite matrices with bounded $Γ$-trace norm, which is a generalization of trace norm defined by a positive definite matrix $Γ.$ To solve this problem, we utilise the follow-the-regularized-leader algorithm with a $Γ$-dependent log-determinant regularizer. Then we apply our generalised setting and our proposed algorithm to online matrix completion(OMC) and online similarity prediction with side information. In particular, we reduce the online matrix completion problem to the generalised OSDP problem, and the side information is represented as the $Γ$ matrix. Hence, due to our regret bound for the generalised OSDP, we obtain an optimal mistake bound for the OMC by removing the logarithmic factor.
DSJul 15, 2020
Improved algorithms for online load balancingYaxiong Liu, Kohei Hatano, Eiji Takimoto
We consider an online load balancing problem and its extensions in the framework of repeated games. On each round, the player chooses a distribution (task allocation) over $K$ servers, and then the environment reveals the load of each server, which determines the computation time of each server for processing the task assigned. After all rounds, the cost of the player is measured by some norm of the cumulative computation-time vector. The cost is the makespan if the norm is $L_\infty$-norm. The goal is to minimize the regret, i.e., minimizing the player's cost relative to the cost of the best fixed distribution in hindsight. We propose algorithms for general norms and prove their regret bounds. In particular, for $L_\infty$-norm, our regret bound matches the best known bound and the proposed algorithm runs in polynomial time per trial involving linear programming and second order programming, whereas no polynomial time algorithm was previously known to achieve the bound.