CVNov 15, 2024

Partial Scene Text Retrieval

arXiv:2411.10261v23 citationsh-index: 18Has CodeIEEE Trans Pattern Anal Mach Intell
Originality Incremental advance
AI Analysis

This work addresses a specific gap in scene text retrieval for computer vision applications, offering an incremental improvement by enabling partial patch retrieval without additional training data.

The paper tackles the problem of partial scene text retrieval, which involves searching for text patches within text-line instances in images, by proposing a network that embeds queries and scene text into a shared feature space and uses a Ranking MIL approach to handle partial patches without extra annotations, achieving improved search efficiency and performance.

The task of partial scene text retrieval involves localizing and searching for text instances that are the same or similar to a given query text from an image gallery. However, existing methods can only handle text-line instances, leaving the problem of searching for partial patches within these text-line instances unsolved due to a lack of patch annotations in the training data. To address this issue, we propose a network that can simultaneously retrieve both text-line instances and their partial patches. Our method embeds the two types of data (query text and scene text instances) into a shared feature space and measures their cross-modal similarities. To handle partial patches, our proposed approach adopts a Multiple Instance Learning (MIL) approach to learn their similarities with query text, without requiring extra annotations. However, constructing bags, which is a standard step of conventional MIL approaches, can introduce numerous noisy samples for training, and lower inference speed. To address this issue, we propose a Ranking MIL (RankMIL) approach to adaptively filter those noisy samples. Additionally, we present a Dynamic Partial Match Algorithm (DPMA) that can directly search for the target partial patch from a text-line instance during the inference stage, without requiring bags. This greatly improves the search efficiency and the performance of retrieving partial patches. The source code and dataset are available at https://github.com/lanfeng4659/PSTR.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes