Detection and Rectification of Arbitrary Shaped Scene Texts by using Text Keypoints and Links
This addresses the problem of robust scene text detection and rectification for applications like document analysis and image understanding, though it is incremental as it builds on existing keypoint-based approaches.
The paper tackles the challenge of detecting and recognizing arbitrarily shaped scene texts by introducing a mask-guided multi-task network that uses text keypoints and links to specify text shapes and boundaries, achieving superior performance compared to state-of-the-art methods on multiple public datasets.
Detection and recognition of scene texts of arbitrary shapes remain a grand challenge due to the super-rich text shape variation in text line orientations, lengths, curvatures, etc. This paper presents a mask-guided multi-task network that detects and rectifies scene texts of arbitrary shapes reliably. Three types of keypoints are detected which specify the centre line and so the shape of text instances accurately. In addition, four types of keypoint links are detected of which the horizontal links associate the detected keypoints of each text instance and the vertical links predict a pair of landmark points (for each keypoint) along the upper and lower text boundary, respectively. Scene texts can be located and rectified by linking up the associated landmark points (giving localization polygon boxes) and transforming the polygon boxes via thin plate spline, respectively. Extensive experiments over several public datasets show that the use of text keypoints is tolerant to the variation in text orientations, lengths, and curvatures, and it achieves superior scene text detection and rectification performance as compared with state-of-the-art methods.