MMCVAug 17, 2018

First Steps Toward CNN based Source Classification of Document Images Shared Over Messaging App

arXiv:1808.05941v111 citations
Originality Synthesis-oriented
AI Analysis

This work addresses source identification for document images in applications like copyright infringement, but it is incremental as it builds on existing methods with a new dataset.

The authors tackled the problem of identifying the source smartphone of printed text document images shared over messaging apps by introducing a new dataset and a CNN-based method, which performed as well as or better than the state-of-the-art system in all tested scenarios.

Knowledge of source smartphone corresponding to a document image can be helpful in a variety of applications including copyright infringement, ownership attribution, leak identification and usage restriction. In this letter, we investigate a convolutional neural network-based approach to solve source smartphone identification problem for printed text documents which have been captured by smartphone cameras and shared over messaging platform. In absence of any publicly available dataset addressing this problem, we introduce a new image dataset consisting of 315 images of documents printed in three different fonts, captured using 21 smartphones and shared over WhatsApp. Experiments conducted on this dataset demonstrate that, in all scenarios, the proposed system performs as well as or better than the state-of-the-art system based on handcrafted features and classification of letters extracted from document images. The new dataset and code of the proposed system will be made publicly available along with this letter's publication, presently they are submitted for review.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes