3D-RelNet: Joint Object and Relational Network for 3D Prediction
This addresses the challenge of accurate 3D scene understanding for applications in robotics and computer vision, though it is incremental by building on existing prediction frameworks.
The paper tackles the problem of predicting 3D shape and pose for objects in a scene by incorporating pairwise relational reasoning, showing significant improvements over independent prediction methods across datasets like SUNCG and NYUv2.
We propose an approach to predict the 3D shape and pose for the objects present in a scene. Existing learning based methods that pursue this goal make independent predictions per object, and do not leverage the relationships amongst them. We argue that reasoning about these relationships is crucial, and present an approach to incorporate these in a 3D prediction framework. In addition to independent per-object predictions, we predict pairwise relations in the form of relative 3D pose, and demonstrate that these can be easily incorporated to improve object level estimates. We report performance across different datasets (SUNCG, NYUv2), and show that our approach significantly improves over independent prediction approaches while also outperforming alternate implicit reasoning methods.