A Multimodal Architecture for Endpoint Position Prediction in Team-based Multiplayer Games
This work addresses the challenge of player movement prediction for game developers and AI researchers, though it appears incremental as it builds on existing multimodal and attention-based methods.
The paper tackles the problem of predicting future player positions in team-based multiplayer games by introducing a multimodal architecture that uses a U-Net-based approach with a multimodal feature encoder and multi-head attention. The result is a technique that efficiently utilizes heterogeneous game data to generate endpoint location probability heatmaps, enabling applications like player-mimicking bots and behavior analytics.
Understanding and predicting player movement in multiplayer games is crucial for achieving use cases such as player-mimicking bot navigation, preemptive bot control, strategy recommendation, and real-time player behavior analytics. However, the complex environments allow for a high degree of navigational freedom, and the interactions and team-play between players require models that make effective use of the available heterogeneous input data. This paper presents a multimodal architecture for predicting future player locations on a dynamic time horizon, using a U-Net-based approach for calculating endpoint location probability heatmaps, conditioned using a multimodal feature encoder. The application of a multi-head attention mechanism for different groups of features allows for communication between agents. In doing so, the architecture makes efficient use of the multimodal game state including image inputs, numerical and categorical features, as well as dynamic game data. Consequently, the presented technique lays the foundation for various downstream tasks that rely on future player positions such as the creation of player-predictive bot behavior or player anomaly detection.