CVMay 10, 2022

UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection

arXiv:2205.04683v16 citationsh-index: 40
Originality Incremental advance
AI Analysis

This addresses a domain adaptation problem for scene text detection researchers, offering an incremental improvement to reduce performance gaps without extra computational overhead.

The paper tackles the domain gap between synthetic pre-training data and real-world data in scene text detection by introducing an unsupervised intermediate training stage (UNITS), which improves detector performance without adding inference costs, achieving consistent gains on three public datasets.

Recent scene text detection methods are almost based on deep learning and data-driven. Synthetic data is commonly adopted for pre-training due to expensive annotation cost. However, there are obvious domain discrepancies between synthetic data and real-world data. It may lead to sub-optimal performance to directly adopt the model initialized by synthetic data in the fine-tuning stage. In this paper, we propose a new training paradigm for scene text detection, which introduces an \textbf{UN}supervised \textbf{I}ntermediate \textbf{T}raining \textbf{S}tage (UNITS) that builds a buffer path to real-world data and can alleviate the gap between the pre-training stage and fine-tuning stage. Three training strategies are further explored to perceive information from real-world data in an unsupervised way. With UNITS, scene text detectors are improved without introducing any parameters and computations during inference. Extensive experimental results show consistent performance improvements on three public datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes