CVApr 8, 2014

Entropy Computation of Document Images in Run-Length Compressed Domain

P. Nagabhushan, Mohammed Javed, B. B. Chaudhuri

arXiv:1404.2014v114 citations

AI Analysis

This work addresses efficiency in document processing for applications like retrieval and word spotting, but it is incremental as it applies existing methods to compressed data.

The paper tackles the problem of needing to decompress documents for analytical computations by computing entropy directly from run-length compressed document images, achieving 100% matching results with existing benchmarks.

Compression of documents, images, audios and videos have been traditionally practiced to increase the efficiency of data storage and transfer. However, in order to process or carry out any analytical computations, decompression has become an unavoidable pre-requisite. In this research work, we have attempted to compute the entropy, which is an important document analytic directly from the compressed documents. We use Conventional Entropy Quantifier (CEQ) and Spatial Entropy Quantifiers (SEQ) for entropy computations [1]. The entropies obtained are useful in applications like establishing equivalence, word spotting and document retrieval. Experiments have been performed with all the data sets of [1], at character, word and line levels taking compressed documents in run-length compressed domain. The algorithms developed are computational and space efficient, and results obtained match 100% with the results reported in [1].

View on arXiv PDF

Similar