Character Proposal Network for Robust Text Extraction
This work improves text detection for applications like scene text recognition by proposing a more robust character proposal method, though it is incremental as it builds on existing FCN-based approaches.
The paper tackles the problem of generating character proposals for text extraction by addressing limitations of MSER, such as handling connected characters and non-uniform illumination, and achieves recall rates of 93.88%, 93.60%, and 96.46% on ICDAR 2013, SVT, and Chinese2k datasets with fewer than 1000 proposals.
Maximally stable extremal regions (MSER), which is a popular method to generate character proposals/candidates, has shown superior performance in scene text detection. However, the pixel-level operation limits its capability for handling some challenging cases (e.g., multiple connected characters, separated parts of one character and non-uniform illumination). To better tackle these cases, we design a character proposal network (CPN) by taking advantage of the high capacity and fast computing of fully convolutional network (FCN). Specifically, the network simultaneously predicts characterness scores and refines the corresponding locations. The characterness scores can be used for proposal ranking to reject non-character proposals and the refining process aims to obtain the more accurate locations. Furthermore, considering the situation that different characters have different aspect ratios, we propose a multi-template strategy, designing a refiner for each aspect ratio. The extensive experiments indicate our method achieves recall rates of 93.88%, 93.60% and 96.46% on ICDAR 2013, SVT and Chinese2k datasets respectively using less than 1000 proposals, demonstrating promising performance of our character proposal network.