CVApr 1, 2019

Equivariant Multi-View Networks

Carlos Esteves, Yinshuang Xu, Christine Allen-Blanchette, Kostas Daniilidis

arXiv:1904.00993v221.7108 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the need for better view aggregation in 3D vision tasks, offering a novel method that improves performance in applications like shape retrieval and scene classification.

The paper tackled the problem of suboptimal global descriptors in 3D vision tasks by proposing an equivariant multi-view network that aggregates views using group convolutions, achieving state-of-the-art results in large-scale 3D shape retrieval and panoramic scene classification.

Several popular approaches to 3D vision tasks process multiple views of the input independently with deep neural networks pre-trained on natural images, achieving view permutation invariance through a single round of pooling over all views. We argue that this operation discards important information and leads to subpar global descriptors. In this paper, we propose a group convolutional approach to multiple view aggregation where convolutions are performed over a discrete subgroup of the rotation group, enabling, thus, joint reasoning over all views in an equivariant (instead of invariant) fashion, up to the very last layer. We further develop this idea to operate on smaller discrete homogeneous spaces of the rotation group, where a polar view representation is used to maintain equivariance with only a fraction of the number of input views. We set the new state of the art in several large scale 3D shape retrieval tasks, and show additional applications to panoramic scene classification.

View on arXiv PDF Code

Similar