Training Protocol Matters: Towards Accurate Scene Text Recognition via Training Protocol Searching
This work addresses the under-explored aspect of training protocols in STR, offering a method to enhance model performance for researchers and practitioners in computer vision, though it is incremental as it builds on existing models.
The paper tackles the problem of improving scene text recognition (STR) accuracy by searching for optimal training protocols, achieving a 2.7% to 3.9% accuracy boost for mainstream models and enabling TRBA-Net to surpass state-of-the-art accuracy by 2.1% while being faster.
The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models. However, training protocol (i.e., settings of the hyper-parameters involved in the training of STR models), which plays an equally important role in successfully training a good STR model, is under-explored for scene text recognition. In this work, we attempt to improve the accuracy of existing STR models by searching for optimal training protocol. Specifically, we develop a training protocol search algorithm, based on a newly designed search space and an efficient search algorithm using evolutionary optimization and proxy tasks. Experimental results show that our searched training protocol can improve the recognition accuracy of mainstream STR models by 2.7%~3.9%. In particular, with the searched training protocol, TRBA-Net achieves 2.1% higher accuracy than the state-of-the-art STR model (i.e., EFIFSTR), while the inference speed is 2.3x and 3.7x faster on CPU and GPU respectively. Extensive experiments are conducted to demonstrate the effectiveness of the proposed method and the generalization ability of the training protocol found by our search method. Code is available at https://github.com/VDIGPKU/STR_TPSearch.