Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation
This addresses the challenge of severe viewpoint changes in object detection for computer vision applications, but it is incremental as it builds on existing 2D methods.
The paper tackled the problem of limited spatial invariance in 2D convolutional neural networks for object detection and viewpoint estimation by introducing cylindrical convolutional networks (CCNs) that use 3D cylindrical kernels, resulting in improved performance on joint tasks.
Existing techniques to encode spatial invariance within deep convolutional neural networks only model 2D transformation fields. This does not account for the fact that objects in a 2D space are a projection of 3D ones, and thus they have limited ability to severe object viewpoint changes. To overcome this limitation, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space. CCNs extract a view-specific feature through a view-specific convolutional kernel to predict object category scores at each viewpoint. With the view-specific feature, we simultaneously determine objective category and viewpoints using the proposed sinusoidal soft-argmax module. Our experiments demonstrate the effectiveness of the cylindrical convolutional networks on joint object detection and viewpoint estimation.