Integrating Visual and Semantic Similarity Using Hierarchies for Image Retrieval
This addresses the issue of semantic mismatch in image retrieval for users in fields like biology and general computer vision, though it is incremental as it builds on existing deep learning and hierarchy-based approaches.
The paper tackled the problem of content-based image retrieval where visually similar results may lack semantic relevance by proposing a method that integrates visual and semantic similarity using a hierarchy constructed from deep neural network latent spaces. Experiments on CUB-200-2011, CIFAR100, and diatom microscopy datasets showed superior performance compared to existing methods.
Most of the research in content-based image retrieval (CBIR) focus on developing robust feature representations that can effectively retrieve instances from a database of images that are visually similar to a query. However, the retrieved images sometimes contain results that are not semantically related to the query. To address this, we propose a method for CBIR that captures both visual and semantic similarity using a visual hierarchy. The hierarchy is constructed by merging classes with overlapping features in the latent space of a deep neural network trained for classification, assuming that overlapping classes share high visual and semantic similarities. Finally, the constructed hierarchy is integrated into the distance calculation metric for similarity search. Experiments on standard datasets: CUB-200-2011 and CIFAR100, and a real-life use case using diatom microscopy images show that our method achieves superior performance compared to the existing methods on image retrieval.