A fast compression-based similarity measure with applications to content-based image retrieval
This addresses efficiency issues in compression-based similarity measures for researchers and practitioners working with medium-to-large datasets, though it appears incremental as it builds on existing techniques.
The paper tackles the computational complexity problem of compression-based similarity measures for medium-to-large datasets by proposing the Fast Compression Distance (FCD), which reduces complexity without performance degradation, and applies it to a content-based color image retrieval system that compares favorably to state-of-the-art methods.
Compression-based similarity measures are effectively employed in applications on diverse data types with a basically parameter-free approach. Nevertheless, there are problems in applying these techniques to medium-to-large datasets which have been seldom addressed. This paper proposes a similarity measure based on compression with dictionaries, the Fast Compression Distance (FCD), which reduces the complexity of these methods, without degradations in performance. On its basis a content-based color image retrieval system is defined, which can be compared to state-of-the-art methods based on invariant color features. Through the FCD a better understanding of compression-based techniques is achieved, by performing experiments on datasets which are larger than the ones analyzed so far in literature.