Chunyu Xuan

h-index6
2papers

2 Papers

OCMar 7, 2023
Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization

Feihu Huang, Chunyu Xuan, Xinrui Wang et al.

Minimax optimization recently is widely applied in many machine learning tasks such as generative adversarial networks, robust learning and reinforcement learning. In the paper, we study a class of nonconvex-nonconcave minimax optimization with nonsmooth regularization, where the objective function is possibly nonconvex on primal variable $x$, and it is nonconcave and satisfies the Polyak-Lojasiewicz (PL) condition on dual variable $y$. Moreover, we propose a class of enhanced momentum-based gradient descent ascent methods (i.e., MSGDA and AdaMSGDA) to solve these stochastic nonconvex-PL minimax problems. In particular, our AdaMSGDA algorithm can use various adaptive learning rates in updating the variables $x$ and $y$ without relying on any specifical types. Theoretically, we prove that our methods have the best known sample complexity of $\tilde{O}(ε^{-3})$ only requiring one sample at each loop in finding an $ε$-stationary solution. Some numerical experiments on PL-game and Wasserstein-GAN demonstrate the efficiency of our proposed methods.

AIApr 25, 2024Code
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze

Chunyu Xuan, Yazhe Niu, Yuan Pu et al.

Monte Carlo Tree Search (MCTS)-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency from stale data, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost tree search operations for MCTS-based algorithms. Specifically, drawing inspiration from the one-armed bandit model, we reanalyze training samples through a backward-view reuse technique which uses the value estimation of a certain child node to save the corresponding sub-tree search time. To further adapt to this design, we periodically reanalyze the entire buffer instead of frequently reanalyzing the mini-batch. The synergy of these two designs can significantly reduce the search cost and meanwhile guarantee or even improve performance, simplifying both data collecting and reanalyzing. Experiments conducted on Atari environments, DMControl suites and board games demonstrate that ReZero substantially improves training speed while maintaining high sample efficiency. The code is available as part of the LightZero MCTS benchmark at https://github.com/opendilab/LightZero.