FEDS -- Filtered Edit Distance Surrogate
This work addresses scene text recognition for computer vision applications, offering incremental improvements over existing methods.
The paper tackles the problem of improving scene text recognition by training a model with a robust learned surrogate of edit distance, filtering out hard training examples using a ramp function for end-to-end training. The result is an average improvement of 11.2% on total edit distance and a 9.5% error reduction on accuracy across challenging datasets like IIIT-5K, SVT, ICDAR, SVTP, and CUTE.
This paper proposes a procedure to train a scene text recognition model using a robust learned surrogate of edit distance. The proposed method borrows from self-paced learning and filters out the training examples that are hard for the surrogate. The filtering is performed by judging the quality of the approximation, using a ramp function, enabling end-to-end training. Following the literature, the experiments are conducted in a post-tuning setup, where a trained scene text recognition model is tuned using the learned surrogate of edit distance. The efficacy is demonstrated by improvements on various challenging scene text datasets such as IIIT-5K, SVT, ICDAR, SVTP, and CUTE. The proposed method provides an average improvement of $11.2 \%$ on total edit distance and an error reduction of $9.5\%$ on accuracy.