LGSep 23, 2022Code
On Efficient Reinforcement Learning for Full-length Game of StarCraft IIRuo-Ze Liu, Zhen-Jia Pang, Zhou-Yu Meng et al.
StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL), of which the main difficulties include huge state space, varying action space, and a long time horizon. In this work, we investigate a set of RL techniques for the full-length game of StarCraft II. We investigate a hierarchical RL approach involving extracted macro-actions and a hierarchical architecture of neural networks. We investigate a curriculum transfer training procedure and train the agent on a single machine with 4 GPUs and 48 CPU threads. On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat models, we achieve a 93% win rate against the most difficult non-cheating level built-in AI (level-7). In this extended version of the paper, we improve our architecture to train the agent against the cheating level AIs and achieve the win rate against the level-8, level-9, and level-10 AIs as 96%, 97%, and 94%, respectively. Our codes are at https://github.com/liuruoze/HierNet-SC2. To provide a baseline referring the AlphaStar for our work as well as the research and open-source community, we reproduce a scaled-down version of it, mini-AlphaStar (mAS). The latest version of mAS is 1.07, which can be trained on the raw action space which has 564 actions. It is designed to run training on a single common machine, by making the hyper-parameters adjustable. We then compare our work with mAS using the same resources and show that our method is more effective. The codes of mini-AlphaStar are at https://github.com/liuruoze/mini-AlphaStar. We hope our study could shed some light on the future research of efficient reinforcement learning on SC2 and other large-scale games.
LGMar 25, 2020
AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going OnlineGuangda Huzhang, Zhen-Jia Pang, Yongqing Gao et al.
Learning-to-rank (LTR) has become a key technology in E-commerce applications. Most existing LTR approaches follow a supervised learning paradigm from offline labeled data collected from the online system. However, it has been noticed that previous LTR models can have a good validation performance over offline validation data but have a poor online performance, and vice versa, which implies a possible large inconsistency between the offline and online evaluation. We investigate and confirm in this paper that such inconsistency exists and can have a significant impact on AliExpress Search. Reasons for the inconsistency include the ignorance of item context during the learning, and the offline data set is insufficient for learning the context. Therefore, this paper proposes an evaluator-generator framework for LTR with item context. The framework consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning, and a discriminator that ensures the generalization of the evaluator. Extensive experiments in simulation environments and AliExpress Search online system show that, firstly, the classic data-based metrics on the offline dataset can show significant inconsistency with online performance, and can even be misleading. Secondly, the proposed evaluator score is significantly more consistent with the online performance than common ranking metrics. Finally, as the consequence, our method achieves a significant improvement (\textgreater$2\%$) in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.
LGMar 2, 2019
Efficient Reinforcement Learning for StarCraft by Abstract Forward Models and Transfer LearningRuo-Ze Liu, Haifeng Guo, Xiaozhong Ji et al.
Injecting human knowledge is an effective way to accelerate reinforcement learning (RL). However, these methods are underexplored. This paper presents our discovery that an abstract forward model (thought-game (TG)) combined with transfer learning (TL) is an effective way. We take StarCraft II as our study environment. With the help of a designed TG, the agent can learn a 99% win-rate on a 64x64 map against the Level-7 built-in AI, using only 1.08 hours in a single commercial machine. We also show that the TG method is not as restrictive as it was thought to be. It can work with roughly designed TGs, and can also be useful when the environment changes. Comparing with previous model-based RL, we show TG is more effective. We also present a TG hypothesis that gives the influence of different fidelity levels of TG. For real games that have unequal state and action spaces, we proposed a novel XfrNet of which usefulness is validated while achieving a 90% win-rate against the cheating Level-10 AI. We argue that the TG method might shed light on further studies of efficient RL with human knowledge.
LGSep 23, 2018
On Reinforcement Learning for Full-length Game of StarCraftZhen-Jia Pang, Ruo-Ze Liu, Zhou-Yu Meng et al.
StarCraft II poses a grand challenge for reinforcement learning. The main difficulties of it include huge state and action space and a long-time horizon. In this paper, we investigate a hierarchical reinforcement learning approach for StarCraft II. The hierarchy involves two levels of abstraction. One is the macro-action automatically extracted from expert's trajectories, which reduces the action space in an order of magnitude yet remains effective. The other is a two-layer hierarchical architecture which is modular and easy to scale, enabling a curriculum transferring from simpler tasks to more complex tasks. The reinforcement training algorithm for this architecture is also investigated. On a 64x64 map and using restrictive units, we achieve a winning rate of more than 99\% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat model, we can achieve over 93\% winning rate of Protoss against the most difficult non-cheating built-in AI (level-7) of Terran, training within two days using a single machine with only 48 CPU cores and 8 K40 GPUs. It also shows strong generalization performance, when tested against never seen opponents including cheating levels built-in AI and all levels of Zerg and Protoss built-in AI. We hope this study could shed some light on the future research of large-scale reinforcement learning.