SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels
This addresses semi-supervised training for sequence tasks like OCR and ASR, offering a more efficient alternative to pseudo-labeling methods, though it appears incremental as it builds on existing CTC frameworks.
The paper tackles the problem of semi-supervised learning for text recognition by proposing SoftCTC, a novel loss function that extends CTC to handle multiple transcription variants without confidence-based filtering, and demonstrates that it matches the performance of a finely-tuned filtering pipeline on a handwriting recognition task.
This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition. We propose a novel loss function $\unicode{x2013}$ SoftCTC $\unicode{x2013}$ which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence based filtering step which is otherwise a crucial component of pseudo-labeling approaches to semi-supervised learning. We demonstrate the effectiveness of our method on a challenging handwriting recognition task and conclude that SoftCTC matches the performance of a finely-tuned filtering based pipeline. We also evaluated SoftCTC in terms of computational efficiency, concluding that it is significantly more efficient than a naïve CTC-based approach for training on multiple transcription variants, and we make our GPU implementation public.