LGCVDec 5, 2022

SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels

arXiv:2212.02135v35 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses semi-supervised training for sequence tasks like OCR and ASR, offering a more efficient alternative to pseudo-labeling methods, though it appears incremental as it builds on existing CTC frameworks.

The paper tackles the problem of semi-supervised learning for text recognition by proposing SoftCTC, a novel loss function that extends CTC to handle multiple transcription variants without confidence-based filtering, and demonstrates that it matches the performance of a finely-tuned filtering pipeline on a handwriting recognition task.

This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition. We propose a novel loss function $\unicode{x2013}$ SoftCTC $\unicode{x2013}$ which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence based filtering step which is otherwise a crucial component of pseudo-labeling approaches to semi-supervised learning. We demonstrate the effectiveness of our method on a challenging handwriting recognition task and conclude that SoftCTC matches the performance of a finely-tuned filtering based pipeline. We also evaluated SoftCTC in terms of computational efficiency, concluding that it is significantly more efficient than a naïve CTC-based approach for training on multiple transcription variants, and we make our GPU implementation public.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes