Are visual dictionaries generalizable?
This addresses the challenge of generating codebooks for general-purpose, dynamic image collections like the Web, offering a practical solution to reduce computational burden.
The paper tackles the problem of whether visual dictionaries, crucial for image classification and retrieval, can be generalized across datasets without needing the entire collection. It finds that dictionaries based on small subsets or different datasets can produce good representations if they cover diverse low-level features, confirming feasibility for large-scale dynamic environments.
Mid-level features based on visual dictionaries are today a cornerstone of systems for classification and retrieval of images. Those state-of-the-art representations depend crucially on the choice of a codebook (visual dictionary), which is usually derived from the dataset. In general-purpose, dynamic image collections (e.g., the Web), one cannot have the entire collection in order to extract a representative dictionary. However, based on the hypothesis that the dictionary reflects only the diversity of low-level appearances and does not capture semantics, we argue that a dictionary based on a small subset of the data, or even on an entirely different dataset, is able to produce a good representation, provided that the chosen images span a diverse enough portion of the low-level feature space. Our experiments confirm that hypothesis, opening the opportunity to greatly alleviate the burden in generating the codebook, and confirming the feasibility of employing visual dictionaries in large-scale dynamic environments.