CVMar 25, 2020

SCATTER: Selective Context Attentional Scene Text Recognizer

arXiv:2003.11288v1151 citations
AI Analysis

This addresses scene text recognition for applications like document analysis, but it is incremental as it builds on existing attention mechanisms.

The paper tackled the problem of recognizing irregularly shaped text in scene images by introducing SCATTER, a novel architecture that improved state-of-the-art performance by 3.7% on average on benchmarks.

Scene Text Recognition (STR), the task of recognizing text against complex image backgrounds, is an active area of research. Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes. In this paper, we introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER). SCATTER utilizes a stacked block architecture with intermediate supervision during training, that paves the way to successfully train a deep BiLSTM encoder, thus improving the encoding of contextual dependencies. Decoding is done using a two-step 1D attention mechanism. The first attention step re-weights visual features from a CNN backbone together with contextual features computed by a BiLSTM layer. The second attention step, similar to previous papers, treats the features as a sequence and attends to the intra-sequence relationships. Experiments show that the proposed approach surpasses SOTA performance on irregular text recognition benchmarks by 3.7\% on average.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes