R-PHOC: Segmentation-Free Word Spotting using CNN
This work addresses word spotting in document images without requiring segmentation, which is a problem for researchers and practitioners in document analysis, though it is incremental as it builds on existing PHOC embeddings.
The paper tackles segmentation-free word spotting by proposing a region-based CNN that embeds word candidate bounding boxes into an embedding space for nearest neighbor search, improving state-of-the-art on the GW dataset and matching segmentation-based methods in some cases.
This paper proposes a region based convolutional neural network for segmentation-free word spotting. Our net- work takes as input an image and a set of word candidate bound- ing boxes and embeds all bounding boxes into an embedding space, where word spotting can be casted as a simple nearest neighbour search between the query representation and each of the candidate bounding boxes. We make use of PHOC embedding as it has previously achieved significant success in segmentation- based word spotting. Word candidates are generated using a simple procedure based on grouping connected components using some spatial constraints. Experiments show that R-PHOC which operates on images directly can improve the current state-of- the-art in the standard GW dataset and performs as good as PHOCNET in some cases designed for segmentation based word spotting.