Scene Text Recognition With Finer Grid Rectification
This work addresses scene text recognition, a key problem in computer vision for applications like document analysis and autonomous systems, but it appears incremental as it builds on existing rectification and attention methods.
The paper tackled scene text recognition under irregular styles and distortions by proposing Firbarn, an end-to-end trainable model with a finer rectification module and a bidirectional decoder, which outperformed previous works on standard benchmarks, particularly for irregular datasets.
Scene Text Recognition is a challenging problem because of irregular styles and various distortions. This paper proposed an end-to-end trainable model consists of a finer rectification module and a bidirectional attentional recognition network(Firbarn). The rectification module adopts finer grid to rectify the distorted input image and the bidirectional decoder contains only one decoding layer instead of two separated one. Firbarn can be trained in a weak supervised way, only requiring the scene text images and the corresponding word labels. With the flexible rectification and the novel bidirectional decoder, the results of extensive evaluation on the standard benchmarks show Firbarn outperforms previous works, especially on irregular datasets.