CVJun 30, 2020

Using Human Psychophysics to Evaluate Generalization in Scene Text Recognition Models

arXiv:2007.00083v11 citations
AI Analysis

This work addresses the problem of assessing generalization capabilities in scene text recognition models for researchers and practitioners, providing insights into model robustness and training strategies, though it is incremental in nature.

The study evaluated two scene text recognition models (CTC and attention-based) by measuring their generalization to different word lengths, fonts, and occlusion, finding that the CTC model is more robust to noise and occlusion and better at handling varied word lengths, and that adding noise during training improves generalization to occlusion.

Scene text recognition models have advanced greatly in recent years. Inspired by human reading we characterize two important scene text recognition models by measuring their domains i.e. the range of stimulus images that they can read. The domain specifies the ability of readers to generalize to different word lengths, fonts, and amounts of occlusion. These metrics identify strengths and weaknesses of existing models. Relative to the attention-based (Attn) model, we discover that the connectionist temporal classification (CTC) model is more robust to noise and occlusion, and better at generalizing to different word lengths. Further, we show that in both models, adding noise to training images yields better generalization to occlusion. These results demonstrate the value of testing models till they break, complementing the traditional data science focus on optimizing performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes