LGOct 11, 2024

Carefully Structured Compression: Efficiently Managing StarCraft II Data

arXiv:2410.08659v11 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This addresses dataset management inefficiencies for researchers and practitioners working with structured data in domains like gaming or simulations, though it is incremental as it builds on existing compression and serialization techniques.

The paper tackles the high cost of creating and storing complex datasets like StarCraft II by introducing a serialization framework that reduces these costs and improves usability, with models trained on the dataset outperforming comparable ones.

Creation and storage of datasets are often overlooked input costs in machine learning, as many datasets are simple image label pairs or plain text. However, datasets with more complex structures, such as those from the real time strategy game StarCraft II, require more deliberate thought and strategy to reduce cost of ownership. We introduce a serialization framework for StarCraft II that reduces the cost of dataset creation and storage, as well as improving usage ergonomics. We benchmark against the most comparable existing dataset from \textit{AlphaStar-Unplugged} and highlight the benefit of our framework in terms of both the cost of creation and storage. We use our dataset to train deep learning models that exceed the performance of comparable models trained on other datasets. The dataset conversion and usage framework introduced is open source and can be used as a framework for datasets with similar characteristics such as digital twin simulations. Pre-converted StarCraft II tournament data is also available online.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes