CVSep 4, 2018

Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images

arXiv:1809.00854v2
AI Analysis

This work addresses word spotting for scene understanding and visual assistance in egocentric camera streams, representing an incremental improvement.

The authors tackled word spotting in egocentric scene images by proposing Soft-PHOC, an intermediate representation using character probability maps, and achieved results evaluated on the ICDAR 2015 Challenge 4 dataset.

Word spotting in natural scene images has many applications in scene understanding and visual assistance. In this paper we propose a technique to create and exploit an intermediate representation of images based on text attributes which are character probability maps. Our representation extends the concept of the Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We call this representation the Soft-PHOC. Furthermore, we show how to use Soft-PHOC descriptors for word spotting tasks in egocentric camera streams through an efficient text line proposal algorithm. This is based on the Hough Transform over character attribute maps followed by scoring using Dynamic Time Warping (DTW). We evaluate our results on ICDAR 2015 Challenge 4 dataset of incidental scene text captured by an egocentric camera.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes