CVAILGSep 13, 2022

Document Image Binarization in JPEG Compressed Domain using Dual Discriminator Generative Adversarial Networks

arXiv:2209.05921v1h-index: 12
Originality Incremental advance
AI Analysis

This addresses a domain-specific problem for document image analysis applications like OCR, offering an incremental improvement by adapting existing GAN methods to compressed domains.

The paper tackled document image binarization in JPEG compressed images without full decompression by proposing a Dual Discriminator GAN model, achieving state-of-the-art performance on DIBCO datasets with robustness and efficiency in time and space.

Image binarization techniques are being popularly used in enhancement of noisy and/or degraded images catering different Document Image Anlaysis (DIA) applications like word spotting, document retrieval, and OCR. Most of the existing techniques focus on feeding pixel images into the Convolution Neural Networks to accomplish document binarization, which may not produce effective results when working with compressed images that need to be processed without full decompression. Therefore in this research paper, the idea of document image binarization directly using JPEG compressed stream of document images is proposed by employing Dual Discriminator Generative Adversarial Networks (DD-GANs). Here the two discriminator networks - Global and Local work on different image ratios and use focal loss as generator loss. The proposed model has been thoroughly tested with different versions of DIBCO dataset having challenges like holes, erased or smudged ink, dust, and misplaced fibres. The model proved to be highly robust, efficient both in terms of time and space complexities, and also resulted in state-of-the-art performance in JPEG compressed domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes