Data Games: A Game-Theoretic Approach to Swarm Robotic Data Collection
This addresses data collection inefficiencies for swarm robotics and autonomous systems, offering a novel cooperative approach with theoretical and experimental validation, though it is incremental in applying game theory to a known bottleneck.
The paper tackles the problem of efficiently collecting diverse training data from fleets of autonomous vehicles under bandwidth and labeling constraints by proposing a cooperative game-theoretic sampling strategy, which outperforms standard benchmarks by up to 21.9% on perception datasets including autonomous driving in adverse weather.
Fleets of networked autonomous vehicles (AVs) collect terabytes of sensory data, which is often transmitted to central servers (the ''cloud'') for training machine learning (ML) models. Ideally, these fleets should upload all their data, especially from rare operating contexts, in order to train robust ML models. However, this is infeasible due to prohibitive network bandwidth and data labeling costs. Instead, we propose a cooperative data sampling strategy where geo-distributed AVs collaborate to collect a diverse ML training dataset in the cloud. Since the AVs have a shared objective but minimal information about each other's local data distribution and perception model, we can naturally cast cooperative data collection as an $N$-player mathematical game. We show that our cooperative sampling strategy uses minimal information to converge to a centralized oracle policy with complete information about all AVs. Moreover, we theoretically characterize the performance benefits of our game-theoretic strategy compared to greedy sampling. Finally, we experimentally demonstrate that our method outperforms standard benchmarks by up to $21.9\%$ on 4 perception datasets, including for autonomous driving in adverse weather conditions. Crucially, our experimental results on real-world datasets closely align with our theoretical guarantees.