CVMar 28, 2023

Deformable Kernel Expansion Model for Efficient Arbitrary-shaped Scene Text Detection

arXiv:2303.15737v11 citationsh-index: 7
Originality Highly original
AI Analysis

This work addresses scene text detection for computer vision applications, offering an incremental improvement by hybridizing existing methods.

The authors tackled the problem of detecting arbitrary-shaped text in images by proposing the Deformable Kernel Expansion (DKE) model, which combines segmentation and contour-based approaches to achieve a good tradeoff between accuracy and efficiency, as demonstrated on multiple benchmarks like CTW1500 and Total-Text.

Scene text detection is a challenging computer vision task due to the high variation in text shapes and ratios. In this work, we propose a scene text detector named Deformable Kernel Expansion (DKE), which incorporates the merits of both segmentation and contour-based detectors. DKE employs a segmentation module to segment the shrunken text region as the text kernel, then expands the text kernel contour to obtain text boundary by regressing the vertex-wise offsets. Generating the text kernel by segmentation enables DKE to inherit the arbitrary-shaped text region modeling capability of segmentation-based detectors. Regressing the kernel contour with some sampled vertices enables DKE to avoid the complicated pixel-level post-processing and better learn contour deformation as the contour-based detectors. Moreover, we propose an Optimal Bipartite Graph Matching Loss (OBGML) that measures the matching error between the predicted contour and the ground truth, which efficiently minimizes the global contour matching distance. Extensive experiments on CTW1500, Total-Text, MSRA-TD500, and ICDAR2015 demonstrate that DKE achieves a good tradeoff between accuracy and efficiency in scene text detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes