Relational Proxies: Emergent Relationships as Fine-Grained Discriminators
This addresses the challenge of fine-grained visual recognition for applications like species identification or product categorization, representing a novel method for a known bottleneck.
The paper tackles the problem of discriminating fine-grained categories that share similar parts by proposing Relational Proxies, which leverages relational information between global and local views, achieving state-of-the-art results on seven benchmark datasets with margins exceeding 4% in some cases.
Fine-grained categories that largely share the same set of parts cannot be discriminated based on part information alone, as they mostly differ in the way the local parts relate to the overall global structure of the object. We propose Relational Proxies, a novel approach that leverages the relational information between the global and local views of an object for encoding its semantic label. Starting with a rigorous formalization of the notion of distinguishability between fine-grained categories, we prove the necessary and sufficient conditions that a model must satisfy in order to learn the underlying decision boundaries in the fine-grained setting. We design Relational Proxies based on our theoretical findings and evaluate it on seven challenging fine-grained benchmark datasets and achieve state-of-the-art results on all of them, surpassing the performance of all existing works with a margin exceeding 4% in some cases. We also experimentally validate our theory on fine-grained distinguishability and obtain consistent results across multiple benchmarks. Implementation is available at https://github.com/abhrac/relational-proxies.