Junyeop Kwon

LGJun 19, 2024

Infinite-Horizon Reinforcement Learning with Multinomial Logistic Function Approximation

Jaehyun Park, Junyeop Kwon, Dabeen Lee

We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. We develop a provably efficient discounted value iteration-based algorithm that works for both infinite-horizon average-reward and discounted-reward settings. For average-reward communicating MDPs, the algorithm guarantees a regret upper bound of $\tilde{\mathcal{O}}(dD\sqrt{T})$ where $d$ is the dimension of feature mapping, $D$ is the diameter of the underlying MDP, and $T$ is the horizon. For discounted-reward MDPs, our algorithm achieves $\tilde{\mathcal{O}}(d(1-γ)^{-2}\sqrt{T})$ regret where $γ$ is the discount factor. Then we complement these upper bounds by providing several regret lower bounds. We prove a lower bound of $Ω(d\sqrt{DT})$ for learning communicating MDPs of diameter $D$ and a lower bound of $Ω(d(1-γ)^{3/2}\sqrt{T})$ for learning discounted-reward MDPs with discount factor $γ$. Lastly, we show a regret lower bound of $Ω(dH^{3/2}\sqrt{K})$ for learning $H$-horizon episodic MDPs with MNL function approximation where $K$ is the number of episodes, which improves upon the best-known lower bound for the finite-horizon setting.

LGFeb 23, 2024

Parameter-Free Algorithms for Performative Regret Minimization under Decision-Dependent Distributions

Sungwoo Park, Junyeop Kwon, Byeongnoh Kim et al.

This paper studies performative risk minimization, a formulation of stochastic optimization under decision-dependent distributions. We consider the general case where the performative risk can be non-convex, for which we develop efficient parameter-free optimistic optimization-based methods. Our algorithms significantly improve upon the existing Lipschitz bandit-based method in many aspects. In particular, our framework does not require knowledge about the sensitivity parameter of the distribution map and the Lipshitz constant of the loss function. This makes our framework practically favorable, together with the efficient optimistic optimization-based tree-search mechanism. We provide experimental results that demonstrate the numerical superiority of our algorithms over the existing method and other black-box optimistic optimization methods.

Junyeop Kwon

2 Papers