CVMar 20, 2023

Weakly-Supervised Text Instance Segmentation

arXiv:2303.10848v26 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the high cost of pixel-level annotations in text segmentation for computer vision applications, though it is incremental as it builds on existing weakly-supervised techniques.

The paper tackles the problem of text instance segmentation by proposing a weakly-supervised method that bridges text recognition and segmentation, achieving significant improvements of 18.95% on ICDAR13-FST and 17.80% on TextSeg benchmarks.

Text segmentation is a challenging vision task with many downstream applications. Current text segmentation methods require pixel-level annotations, which are expensive in the cost of human labor and limited in application scenarios. In this paper, we take the first attempt to perform weakly-supervised text instance segmentation by bridging text recognition and text segmentation. The insight is that text recognition methods provide precise attention position of each text instance, and the attention location can feed to both a text adaptive refinement head (TAR) and a text segmentation head. Specifically, the proposed TAR generates pseudo labels by performing two-stage iterative refinement operations on the attention location to fit the accurate boundaries of the corresponding text instance. Meanwhile, the text segmentation head takes the rough attention location to predict segmentation masks which are supervised by the aforementioned pseudo labels. In addition, we design a mask-augmented contrastive learning by treating our segmentation result as an augmented version of the input text image, thus improving the visual representation and further enhancing the performance of both recognition and segmentation. The experimental results demonstrate that the proposed method significantly outperforms weakly-supervised instance segmentation methods on ICDAR13-FST (18.95$\%$ improvement) and TextSeg (17.80$\%$ improvement) benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes