CVCLAug 25, 2023

DISGO: Automatic End-to-End Evaluation for Scene Text OCR

arXiv:2308.13173v15 citationsh-index: 64
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of standardized evaluation for scene-text OCR, which is incremental as it adapts existing metrics to a specific domain.

The paper tackles the challenge of evaluating optical character recognition (OCR) on natural scenes by proposing a uniform word error rate (WER) metric called DISGO WER, which accounts for deletion, insertion, substitution, and grouping/ordering errors, and demonstrates its performance on the SCUT test set with a modularized OCR system.

This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds. We propose to uniformly use word error rates (WER) as a new measurement for evaluating scene-text OCR, both end-to-end (e2e) performance and individual system component performances. Particularly for the e2e metric, we name it DISGO WER as it considers Deletion, Insertion, Substitution, and Grouping/Ordering errors. Finally we propose to utilize the concept of super blocks to automatically compute BLEU scores for e2e OCR machine translation. The small SCUT public test set is used to demonstrate WER performance by a modularized OCR system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes