CVJun 4, 2023

ESTISR: Adapting Efficient Scene Text Image Super-resolution for Real-Scenes

arXiv:2306.02443v12 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses efficiency in STISR for real-world deployment, though it is incremental as it builds on existing STISR methods with optimizations for resource constraints.

The authors tackled the problem of inefficient scene text image super-resolution (STISR) for deployment on resource-limited platforms by proposing ESTISR, which uses a re-parameterized inverted residual block and a novel softmax shrinking self-attention mechanism, resulting in improved STR accuracy on TextZoom and better trade-offs in running time and memory consumption compared to current methods.

While scene text image super-resolution (STISR) has yielded remarkable improvements in accurately recognizing scene text, prior methodologies have placed excessive emphasis on optimizing performance, rather than paying due attention to efficiency - a crucial factor in ensuring deployment of the STISR-STR pipeline. In this work, we propose a novel Efficient Scene Text Image Super-resolution (ESTISR) Network for resource-limited deployment platform. ESTISR's functionality primarily depends on two critical components: a CNN-based feature extractor and an efficient self-attention mechanism used for decoding low-resolution images. We designed a re-parameterized inverted residual block specifically suited for resource-limited circumstances as the feature extractor. Meanwhile, we proposed a novel self-attention mechanism, softmax shrinking, based on a kernel-based approach. This innovative technique offers linear complexity while also naturally incorporating discriminating low-level features into the self-attention structure. Extensive experiments on TextZoom show that ESTISR retains a high image restoration quality and improved STR accuracy of low-resolution images. Furthermore, ESTISR consistently outperforms current methods in terms of actual running time and peak memory consumption, while achieving a better trade-off between performance and efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes