Presence-Only Geographical Priors for Fine-Grained Image Classification
This work addresses the challenge of differentiating fine-grained visual categories for applications like biodiversity monitoring or image tagging, but it is incremental as it builds on existing classifiers by adding contextual cues.
The paper tackles the problem of fine-grained image classification by incorporating spatio-temporal priors from geographical and temporal metadata, which are often available but underutilized. The result is a large improvement in classification performance when combined with image-based predictions, as shown in experiments on multiple challenging datasets.
Appearance information alone is often not sufficient to accurately differentiate between fine-grained visual categories. Human experts make use of additional cues such as where, and when, a given image was taken in order to inform their final decision. This contextual information is readily available in many online image collections but has been underutilized by existing image classifiers that focus solely on making predictions based on the image contents. We propose an efficient spatio-temporal prior, that when conditioned on a geographical location and time, estimates the probability that a given object category occurs at that location. Our prior is trained from presence-only observation data and jointly models object categories, their spatio-temporal distributions, and photographer biases. Experiments performed on multiple challenging image classification datasets show that combining our prior with the predictions from image classifiers results in a large improvement in final classification performance.