LGMay 27, 2022

Improving Bidding and Playing Strategies in the Trick-Taking game Wizard using Deep Q-Networks

arXiv:2205.13834v1h-index: 6
Originality Incremental advance
AI Analysis

This work addresses strategy optimization in a specific trick-taking game, which is an incremental improvement for game AI research.

The authors tackled the problem of improving bidding and playing strategies in the trick-taking game Wizard by modeling it with POMDPs and using Deep Q-Networks, achieving accuracies between 66% and 87% in self-play, outperforming random and rule-based baselines.

In this work, the trick-taking game Wizard with a separate bidding and playing phase is modeled by two interleaved partially observable Markov decision processes (POMDP). Deep Q-Networks (DQN) are used to empower self-improving agents, which are capable of tackling the challenges of a highly non-stationary environment. To compare algorithms between each other, the accuracy between bid and trick count is monitored, which strongly correlates with the actual rewards and provides a well-defined upper and lower performance bound. The trained DQN agents achieve accuracies between 66% and 87% in self-play, leaving behind both a random baseline and a rule-based heuristic. The conducted analysis also reveals a strong information asymmetry concerning player positions during bidding. To overcome the missing Markov property of imperfect-information games, a long short-term memory (LSTM) network is implemented to integrate historic information into the decision-making process. Additionally, a forward-directed tree search is conducted by sampling a state of the environment and thereby turning the game into a perfect information setting. To our surprise, both approaches do not surpass the performance of the basic DQN agent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes