CVAIMMJan 16, 2024

End-to-End Optimized Image Compression with the Frequency-Oriented Transform

arXiv:2401.08194v17 citationsMach Vis Appl
Originality Incremental advance
AI Analysis

This work addresses the problem of interpretability in image compression for applications requiring scalable coding and semantic preservation, representing an incremental improvement over existing deep learning methods.

The authors tackled the challenge of interpretability in deep learning-based image compression by proposing an end-to-end model with a frequency-oriented transform, which outperforms traditional codecs like H.266/VVC on MS-SSIM metrics and preserves semantic fidelity in visual analysis tasks.

Image compression constitutes a significant challenge amidst the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method could preserve semantic fidelity besides signal-level precision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes