3M: Multi-modal Multi-task Multi-teacher Learning for Game Event Detection
This addresses the problem of newcomers struggling to comprehend esports gameplay due to chaotic chat, fast commentary, and complex interfaces, though it appears incremental in approach.
The paper tackles the challenge of understanding complex esports events by proposing a multi-modal multi-teacher learning framework for game event detection, which shows effectiveness in experiments.
Esports has rapidly emerged as a global phenomenon with an ever-expanding audience via platforms, like YouTube. Due to the inherent complexity nature of the game, it is challenging for newcomers to comprehend what the event entails. The chaotic nature of online chat, the fast-paced speech of the game commentator, and the game-specific user interface further compound the difficulty for users in comprehending the gameplay. To overcome these challenges, it is crucial to integrate the Multi-Modal (MM) information from the platform and understand the event. The paper introduces a new MM multi-teacher-based game event detection framework, with the ultimate goal of constructing a comprehensive framework that enhances the comprehension of the ongoing game situation. While conventional MM models typically prioritise aligning MM data through concurrent training towards a unified objective, our framework leverages multiple teachers trained independently on different tasks to accomplish the Game Event Detection. The experiment clearly shows the effectiveness of the proposed MM multi-teacher framework.