Detecting Text in the Wild with Deep Character Embedding Network
This addresses the challenge of detecting text in real-world scenarios where text is often irregularly shaped, which is a problem for applications like document analysis and scene understanding, and represents a novel approach rather than an incremental improvement.
The paper tackles the problem of detecting irregularly shaped text in natural images, such as curved or perspectively distorted text, by proposing a deep character embedding network (CENet) that predicts character bounding boxes and embeddings, enabling text detection through clustering. The method achieves state-of-the-art or comparable performance on standard datasets, with substantial improvements on irregular-text datasets like Total-Text.
Most text detection methods hypothesize texts are horizontal or multi-oriented and thus define quadrangles as the basic detection unit. However, text in the wild is usually perspectively distorted or curved, which can not be easily tackled by existing approaches. In this paper, we propose a deep character embedding network (CENet) which simultaneously predicts the bounding boxes of characters and their embedding vectors, thus making text detection a simple clustering task in the character embedding space. The proposed method does not require strong assumptions of forming a straight line on general text detection, which provides flexibility on arbitrarily curved or perspectively distorted text. For character detection task, a dense prediction subnetwork is designed to obtain the confidence score and bounding boxes of characters. For character embedding task, a subnet is trained with contrastive loss to project detected characters into embedding space. The two tasks share a backbone CNN from which the multi-scale feature maps are extracted. The final text regions can be easily achieved by a thresholding process on character confidence and embedding distance of character pairs. We evaluated our method on ICDAR13, ICDAR15, MSRA-TD500, and Total-Text. The proposed method achieves state-of-the-art or comparable performance on all these datasets, and shows substantial improvement in the irregular-text datasets, i.e. Total-Text.