CVAIOct 17, 2024

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

arXiv:2410.13924v29 citationsh-index: 16Has CodeCVPR
Originality Incremental advance
AI Analysis

This addresses the data bottleneck for 3D vision research, enabling better scaling of models, though it is incremental as it builds on existing datasets and annotation methods.

The paper tackles the problem of limited training data for 3D scene understanding by introducing ARKit LabelMaker, a dataset over three times larger than prior ones, which improves state-of-the-art 3D semantic segmentation accuracy on benchmarks like ScanNet and ScanNet200, with notable gains on tail classes.

Neural network performance scales with both model size and data volume, as shown in both language and image processing. This requires scaling-friendly architectures and large datasets. While transformers have been adapted for 3D vision, a `GPT-moment' remains elusive due to limited training data. We introduce ARKit LabelMaker, a large-scale real-world 3D dataset with dense semantic annotation that is more than three times larger than prior largest dataset. Specifically, we extend ARKitScenes with automatically generated dense 3D labels using an extended LabelMaker pipeline, tailored for large-scale pre-training. Training on our dataset improves accuracy across architectures, achieving state-of-the-art 3D semantic segmentation scores on ScanNet and ScanNet200, with notable gains on tail classes. Our code is available at https://labelmaker.org and our dataset at https://huggingface.co/datasets/labelmaker/arkit_labelmaker.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes