LGAICVDec 4, 2023

Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games

Microsoft
arXiv:2312.02312v36 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the problem of making decision-making research in modern video games more accessible and cost-effective for the research community, though it is incremental as it builds on existing imitation learning and encoder methods.

The study systematically compared pre-trained visual encoders to end-to-end training for imitation learning in modern video games like Minecraft and Counter-Strike, finding that end-to-end training works with low-resolution images and minutes of demonstrations, while pre-trained encoders like DINOv2 offer significant improvements and reduce training costs.

Video games have served as useful benchmarks for the decision-making community, but going beyond Atari games towards modern games has been prohibitively expensive for the vast majority of the research community. Prior work in modern video games typically relied on game-specific integration to obtain game features and enable online training, or on existing large datasets. An alternative approach is to train agents using imitation learning to play video games purely from images. However, this setting poses a fundamental question: which visual encoders obtain representations that retain information critical for decision making? To answer this question, we conduct a systematic study of imitation learning with publicly available pre-trained visual encoders compared to the typical task-specific end-to-end training approach in Minecraft, Counter-Strike: Global Offensive, and Minecraft Dungeons. Our results show that end-to-end training can be effective with comparably low-resolution images and only minutes of demonstrations, but significant improvements can be gained by utilising pre-trained encoders such as DINOv2 depending on the game. In addition to enabling effective decision making, we show that pre-trained encoders can make decision-making research in video games more accessible by significantly reducing the cost of training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes