CVAug 20, 2014

Seeing through bag-of-visual-word glasses: towards understanding quantization effects in feature extraction methods

Alexander Freytag, Johannes Rühle, Paul Bodesheim, Erik Rodner, Joachim Denzler

arXiv:1408.4692v12 citations

Originality Incremental advance

AI Analysis

This provides insights into the limitations of popular visual recognition systems for researchers and practitioners, though it is incremental as it builds on existing inversion techniques.

The paper tackled the problem of understanding how much visual information is lost when quantizing local features in bag-of-visual-words methods, by inverting quantized features and testing human recognition performance with different codebook sizes, finding that quantization significantly reduces human recognition accuracy.

Vector-quantized local features frequently used in bag-of-visual-words approaches are the backbone of popular visual recognition systems due to both their simplicity and their performance. Despite their success, bag-of-words-histograms basically contain low-level image statistics (e.g., number of edges of different orientations). The question remains how much visual information is "lost in quantization" when mapping visual features to code words? To answer this question, we present an in-depth analysis of the effect of local feature quantization on human recognition performance. Our analysis is based on recovering the visual information by inverting quantized local features and presenting these visualizations with different codebook sizes to human observers. Although feature inversion techniques are around for quite a while, to the best of our knowledge, our technique is the first visualizing especially the effect of feature quantization. Thereby, we are now able to compare single steps in common image classification pipelines to human counterparts.

View on arXiv PDF

Similar