Just Noticeable Visual Redundancy Forecasting: A Deep Multimodal-driven Approach
This work addresses visual redundancy forecasting for multimedia systems by incorporating multimodal information, representing an incremental improvement over single-modality methods.
The paper tackled the problem of modeling just noticeable difference (JND) by proposing hmJND-Net, an end-to-end approach that integrates saliency, depth, and segmentation modalities, achieving superior performance validated on eight benchmark datasets against eight representative methods.
Just noticeable difference (JND) refers to the maximum visual change that human eyes cannot perceive, and it has a wide range of applications in multimedia systems. However, most existing JND approaches only focus on a single modality, and rarely consider the complementary effects of multimodal information. In this article, we investigate the JND modeling from an end-to-end homologous multimodal perspective, namely hmJND-Net. Specifically, we explore three important visually sensitive modalities, including saliency, depth, and segmentation. To better utilize homologous multimodal information, we establish an effective fusion method via summation enhancement and subtractive offset, and align homologous multimodal features based on a self-attention driven encoder-decoder paradigm. Extensive experimental results on eight different benchmark datasets validate the superiority of our hmJND-Net over eight representative methods.