Chunlin Sun

h-index2

4papers

38citations

Novelty55%

AI Score32

Ranked #128,351 of 194,257 authors (top 66%)#28,244 in LG (top 70%)

4 Papers

1.9OCJun 28

The Simple Strategy-Iteration Method is Strongly Polynomial for the Turn-Based Deterministic Forward Game

Sanyou Mei, Chunlin Sun, Yinyu Ye

We study Turn-Based Deterministic Forward Games (TBDFGs), the subclass of turn-based deterministic zero-sum games in which no directed cycle contains actions controlled by both players. This forward condition is strictly weaker than acyclicity: recurrent behavior may be arbitrarily rich within one player's states, while mixed-player feedback cycles are excluded. Our main contribution separates two algorithmic consequences of this structure. First, we analyze the simple strategy-iteration method of [11,14], a generic method for TBSGs whose execution neither tests for nor uses the TBDFG property. We prove that this structure-oblivious algorithm nevertheless has a strongly polynomial guarantee on every TBDFG. In particular, it terminates after at most $O(n^6m^4\log^4 n)$ simplex pivot steps. Thus, the forward property acts as a structural certificate for convergence even when the algorithm is not informed that the input has this property. Second, when the TBDFG structure is known in advance, a backward SCC propagation algorithm is proposed that solves a sequence of deterministic-MDP subproblems and improves the bound to $O(n^3m^2\log^2 n)$ simplex pivot steps. Together, these results show that forward structure both regularizes the convergence of a general strategy-iteration method and supports a sharper structure-aware algorithm.

6.4LGFeb 11, 2024

Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods

Wenzhi Gao, Chunlin Sun, Chenyu Xue et al.

Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O}(\sqrt{T})$, which is suboptimal compared to the $\mathcal{O}(\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes several important facts about online linear programming, which unveils the challenge for first-order-method-based online algorithms to achieve beyond $\mathcal{O}(\sqrt{T})$ regret. To address the challenge, we introduce a new algorithmic framework that decouples learning from decision-making. For the first time, we show that first-order methods can attain regret $\mathcal{O}(T^{1/3})$ with this new framework.

4.4LGJul 23, 2021

An Adaptive State Aggregation Algorithm for Markov Decision Processes

Guanting Chen, Johann Demetrio Gaebler, Matt Peng et al.

Value iteration is a well-known method of solving Markov Decision Processes (MDPs) that is simple to implement and boasts strong theoretical convergence guarantees. However, the computational cost of value iteration quickly becomes infeasible as the size of the state space increases. Various methods have been proposed to overcome this issue for value iteration in large state and action space MDPs, often at the price, however, of generalizability and algorithmic simplicity. In this paper, we propose an intuitive algorithm for solving MDPs that reduces the cost of value iteration updates by dynamically grouping together states with similar cost-to-go values. We also prove that our algorithm converges almost surely to within $2\varepsilon / (1 - γ)$ of the true optimal value in the $\ell^\infty$ norm, where $γ$ is the discount factor and aggregated states differ by at most $\varepsilon$. Numerical experiments on a variety of simulated environments confirm the robustness of our algorithm and its ability to solve MDPs with much cheaper updates especially as the scale of the MDP problem increases.

14.6LGFeb 12, 2021

The Symmetry between Arms and Knapsacks: A Primal-Dual Approach for Bandits with Knapsacks

Xiaocheng Li, Chunlin Sun, Yinyu Ye

In this paper, we study the bandits with knapsacks (BwK) problem and develop a primal-dual based algorithm that achieves a problem-dependent logarithmic regret bound. The BwK problem extends the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm, and the existing BwK literature has been mainly focused on deriving asymptotically optimal distribution-free regret bounds. We first study the primal and dual linear programs underlying the BwK problem. From this primal-dual perspective, we discover symmetry between arms and knapsacks, and then propose a new notion of sub-optimality measure for the BwK problem. The sub-optimality measure highlights the important role of knapsacks in determining algorithm regret and inspires the design of our two-phase algorithm. In the first phase, the algorithm identifies the optimal arms and the binding knapsacks, and in the second phase, it exhausts the binding knapsacks via playing the optimal arms through an adaptive procedure. Our regret upper bound involves the proposed sub-optimality measure and it has a logarithmic dependence on length of horizon $T$ and a polynomial dependence on $m$ (the numbers of arms) and $d$ (the number of knapsacks). To the best of our knowledge, this is the first problem-dependent logarithmic regret bound for solving the general BwK problem.