Jingchao Liu

CVSep 11, 2020

ARM: A Confidence-Based Adversarial Reweighting Module for Coarse Semantic Segmentation

Jingchao Liu, Ye Du, Zehua Fu et al.

Coarsely-labeled semantic segmentation annotations are easy to obtain, but therefore bear the risk of losing edge details and introducing background pixels. Impeded by the inherent noise, existing coarse annotations are only taken as a bonus for model pre-training. In this paper, we try to exploit their potentials with a confidence-based reweighting strategy. To expand, loss-based reweighting strategies usually take the high loss value to identify two completely different types of pixels, namely, valuable pixels in noise-free annotations and mislabeled pixels in noisy annotations. This makes it impossible to perform two tasks of mining valuable pixels and suppressing mislabeled pixels at the same time. However, with the help of the prediction confidence, we successfully solve this dilemma and simultaneously perform two subtasks with a single reweighting strategy. Furthermore, we generalize this strategy into an Adversarial Reweighting Module (ARM) and prove its convergence strictly. Experiments on standard datasets shows our ARM can bring consistent improvements for both coarse annotations and fine annotations. Specifically, built on top of DeepLabv3+, ARM improves the mIoU on the coarsely-labeled Cityscapes by a considerable margin and increases the mIoU on the ADE20K dataset to 47.50.

CVMar 28, 2019

Pyramid Mask Text Detector

Jingchao Liu, Xuebo Liu, Jie Sheng et al.

Scene text detection, an essential step of scene text recognition system, is to locate text instances in natural scene images automatically. Some recent attempts benefiting from Mask R-CNN formulate scene text detection task as an instance segmentation problem and achieve remarkable performance. In this paper, we present a new Mask R-CNN based framework named Pyramid Mask Text Detector (PMTD) to handle the scene text detection. Instead of binary text mask generated by the existing Mask R-CNN based methods, our PMTD performs pixel-level regression under the guidance of location-aware supervision, yielding a more informative soft text mask for each text instance. As for the generation of text boxes, PMTD reinterprets the obtained 2D soft mask into 3D space and introduces a novel plane clustering algorithm to derive the optimal text box on the basis of 3D shape. Experiments on standard datasets demonstrate that the proposed PMTD brings consistent and noticeable gain and clearly outperforms state-of-the-art methods. Specifically, it achieves an F-measure of 80.13% on ICDAR 2017 MLT dataset.

Jingchao Liu

2 Papers