Multi-Modal Machine Learning for Assessing Gaming Skills in Online Streaming: A Case Study with CS:GO
This work addresses the need for streaming service providers to discover talented gamers for recommendations, though it is incremental as it builds on existing multi-modal methods.
The study tackled the problem of assessing gaming skills from online streaming videos, specifically for CS:GO, by cleaning a flawed dataset and proposing variants of end-to-end multi-modal models, but found that the models were prone to identifying users rather than learning meaningful representations.
Online streaming is an emerging market that address much attention. Assessing gaming skills from videos is an important task for streaming service providers to discover talented gamers. Service providers require the information to offer customized recommendation and service promotion to their customers. Meanwhile, this is also an important multi-modal machine learning tasks since online streaming combines vision, audio and text modalities. In this study we begin by identifying flaws in the dataset and proceed to clean it manually. Then we propose several variants of latest end-to-end models to learn joint representation of multiple modalities. Through our extensive experimentation, we demonstrate the efficacy of our proposals. Moreover, we identify that our proposed models is prone to identifying users instead of learning meaningful representations. We purpose future work to address the issue in the end.