StandardSim: A Synthetic Dataset For Retail Environments
This addresses data scarcity for autonomous checkout systems, but it is incremental as it builds on existing synthetic dataset methods for a specific domain.
The authors tackled the lack of datasets for autonomous checkout in retail by creating StandardSim, a large-scale synthetic dataset with annotations for tasks like segmentation and depth estimation, and introduced a novel change detection task, showing it provides a difficult benchmark and improves model performance.
Autonomous checkout systems rely on visual and sensory inputs to carry out fine-grained scene understanding in retail environments. Retail environments present unique challenges compared to typical indoor scenes owing to the vast number of densely packed, unique yet similar objects. The problem becomes even more difficult when only RGB input is available, especially for data-hungry tasks such as instance segmentation. To address the lack of datasets for retail, we present StandardSim, a large-scale photorealistic synthetic dataset featuring annotations for semantic segmentation, instance segmentation, depth estimation, and object detection. Our dataset provides multiple views per scene, enabling multi-view representation learning. Further, we introduce a novel task central to autonomous checkout called change detection, requiring pixel-level classification of takes, puts and shifts in objects over time. We benchmark widely-used models for segmentation and depth estimation on our dataset, show that our test set constitutes a difficult benchmark compared to current smaller-scale datasets and that our training set provides models with crucial information for autonomous checkout tasks.