CVOct 30, 2023

Generating Context-Aware Natural Answers for Questions in 3D Scenes

arXiv:2310.19516v18 citationsh-index: 11
Originality Highly original
AI Analysis

This work addresses a limitation in 3D question answering by enabling more natural and context-aware responses, which is incremental as it builds on existing benchmarks but introduces a novel generation approach.

The paper tackles the problem of generating free-form natural answers for questions in 3D scenes, moving beyond pre-defined answer spaces, and achieves state-of-the-art results on the ScanQA benchmark with CIDEr scores of 72.22 and 66.57 on test sets.

3D question answering is a young field in 3D vision-language that is yet to be explored. Previous methods are limited to a pre-defined answer space and cannot generate answers naturally. In this work, we pivot the question answering task to a sequence generation task to generate free-form natural answers for questions in 3D scenes (Gen3DQA). To this end, we optimize our model directly on the language rewards to secure the global sentence semantics. Here, we also adapt a pragmatic language understanding reward to further improve the sentence quality. Our method sets a new SOTA on the ScanQA benchmark (CIDEr score 72.22/66.57 on the test sets).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes