OpenLex3D: A Tiered Evaluation Benchmark for Open-Vocabulary 3D Scene Representations
This provides a more comprehensive evaluation tool for researchers in 3D scene understanding, though it is incremental as it builds on existing datasets and tasks.
The authors tackled the lack of evaluation benchmarks for open-vocabulary 3D scene representations by introducing OpenLex3D, a new benchmark with 13 times more labels per scene than existing datasets, which revealed failure cases and improvement avenues for current methods.
3D scene understanding has been transformed by open-vocabulary language models that enable interaction via natural language. However, at present the evaluation of these representations is limited to datasets with closed-set semantics that do not capture the richness of language. This work presents OpenLex3D, a dedicated benchmark for evaluating 3D open-vocabulary scene representations. OpenLex3D provides entirely new label annotations for scenes from Replica, ScanNet++, and HM3D, which capture real-world linguistic variability by introducing synonymical object categories and additional nuanced descriptions. Our label sets provide 13 times more labels per scene than the original datasets. By introducing an open-set 3D semantic segmentation task and an object retrieval task, we evaluate various existing 3D open-vocabulary methods on OpenLex3D, showcasing failure cases, and avenues for improvement. Our experiments provide insights on feature precision, segmentation, and downstream capabilities. The benchmark is publicly available at: https://openlex3d.github.io/.