A Coarse-to-Fine Model for 3D Pose Estimation and Sub-category Recognition
This addresses the problem of independent task modeling in computer vision for researchers, but it is incremental as it builds on existing datasets and methods.
The paper tackles the joint modeling of object detection, 3D pose estimation, and sub-category recognition by proposing a coarse-to-fine hierarchical representation to prevent performance loss from increased parameters and resolve ambiguities, showing effectiveness on an augmented PASCAL3D+ dataset.
Despite the fact that object detection, 3D pose estimation, and sub-category recognition are highly correlated tasks, they are usually addressed independently from each other because of the huge space of parameters. To jointly model all of these tasks, we propose a coarse-to-fine hierarchical representation, where each level of the hierarchy represents objects at a different level of granularity. The hierarchical representation prevents performance loss, which is often caused by the increase in the number of parameters (as we consider more tasks to model), and the joint modelling enables resolving ambiguities that exist in independent modelling of these tasks. We augment PASCAL3D+ dataset with annotations for these tasks and show that our hierarchical model is effective in joint modelling of object detection, 3D pose estimation, and sub-category recognition.