CVApr 15, 2021

Learning Regional Attention over Multi-resolution Deep Convolutional Features for Trademark Retrieval

arXiv:2104.07240v16 citations
Originality Incremental advance
AI Analysis

This work addresses content-based image retrieval for trademarks, offering incremental enhancements to an existing method.

The paper tackled the problem of large-scale trademark retrieval by modifying the Regional-Maximum Activation of Convolutions (R-MAC) method to address issues like background clutter and scale variance, achieving state-of-the-art results on the million-scale METU dataset with non-trivial improvements.

Large-scale trademark retrieval is an important content-based image retrieval task. A recent study shows that off-the-shelf deep features aggregated with Regional-Maximum Activation of Convolutions (R-MAC) achieve state-of-the-art results. However, R-MAC suffers in the presence of background clutter/trivial regions and scale variance, and discards important spatial information. We introduce three simple but effective modifications to R-MAC to overcome these drawbacks. First, we propose the use of both sum and max pooling to minimise the loss of spatial information. We also employ domain-specific unsupervised soft-attention to eliminate background clutter and unimportant regions. Finally, we add multi-resolution inputs to enhance the scale-invariance of R-MAC. We evaluate these three modifications on the million-scale METU dataset. Our results show that all modifications bring non-trivial improvements, and surpass previous state-of-the-art results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes