CVMay 22, 2014

Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach

arXiv:1405.5732v21 citations
Originality Highly original
AI Analysis

This addresses a fundamental issue in object recognition for computer vision, offering a novel approach to improve model efficiency and performance by better handling varied visual appearances.

The paper tackles the problem of semantic classes not aligning with visual subclasses in computer vision by proposing an incremental optimization technique that unifies clustering and classification, showing that state-of-the-art methods like DPM fail to utilize 50% of training data from the long-tail distribution of visual subclasses.

Computer vision tasks are traditionally defined and evaluated using semantic categories. However, it is known to the field that semantic classes do not necessarily correspond to a unique visual class (e.g. inside and outside of a car). Furthermore, many of the feasible learning techniques at hand cannot model a visual class which appears consistent to the human eye. These problems have motivated the use of 1) Unsupervised or supervised clustering as a preprocessing step to identify the visual subclasses to be used in a mixture-of-experts learning regime. 2) Felzenszwalb et al. part model and other works model mixture assignment with latent variables which is optimized during learning 3) Highly non-linear classifiers which are inherently capable of modelling multi-modal input space but are inefficient at the test time. In this work, we promote an incremental view over the recognition of semantic classes with varied appearances. We propose an optimization technique which incrementally finds maximal visual subclasses in a regularized risk minimization framework. Our proposed approach unifies the clustering and classification steps in a single algorithm. The importance of this approach is its compliance with the classification via the fact that it does not need to know about the number of clusters, the representation and similarity measures used in pre-processing clustering methods a priori. Following this approach we show both qualitatively and quantitatively significant results. We show that the visual subclasses demonstrate a long tail distribution. Finally, we show that state of the art object detection methods (e.g. DPM) are unable to use the tails of this distribution comprising 50\% of the training samples. In fact we show that DPM performance slightly increases on average by the removal of this half of the data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes