Contrastive Learning of Generalized Game Representations
This addresses the challenge of building reusable visual encoders for games, enabling better generalization across unseen games without retraining, though it is incremental in applying contrastive learning to this domain.
The paper tackled the problem of learning generalized game representations from pixels, showing that contrastive learning outperforms supervised learning by ignoring visual style and focusing on game content, with results on a dataset of 175 games and 100k images across 10 genres.
Representing games through their pixels offers a promising approach for building general-purpose and versatile game models. While games are not merely images, neural network models trained on game pixels often capture differences of the visual style of the image rather than the content of the game. As a result, such models cannot generalize well even within similar games of the same genre. In this paper we build on recent advances in contrastive learning and showcase its benefits for representation learning in games. Learning to contrast images of games not only classifies games in a more efficient manner; it also yields models that separate games in a more meaningful fashion by ignoring the visual style and focusing, instead, on their content. Our results in a large dataset of sports video games containing 100k images across 175 games and 10 game genres suggest that contrastive learning is better suited for learning generalized game representations compared to conventional supervised learning. The findings of this study bring us closer to universal visual encoders for games that can be reused across previously unseen games without requiring retraining or fine-tuning.