CVAug 10, 2022

Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction

arXiv:2208.05220v21 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses domain adaptation for audio-visual saliency prediction, an incremental advance in video analysis.

The paper tackled the problem of domain discrepancy in audio-visual saliency prediction by proposing a dual domain-adversarial learning algorithm, which improved performance on target testing data as demonstrated in experiments on public benchmarks.

Both visual and auditory information are valuable to determine the salient regions in videos. Deep convolution neural networks (CNN) showcase strong capacity in coping with the audio-visual saliency prediction task. Due to various factors such as shooting scenes and weather, there often exists moderate distribution discrepancy between source training data and target testing data. The domain discrepancy induces to performance degradation on target testing data for CNN models. This paper makes an early attempt to tackle the unsupervised domain adaptation problem for audio-visual saliency prediction. We propose a dual domain-adversarial learning algorithm to mitigate the domain discrepancy between source and target data. First, a specific domain discrimination branch is built up for aligning the auditory feature distributions. Then, those auditory features are fused into the visual features through a cross-modal self-attention module. The other domain discrimination branch is devised to reduce the domain discrepancy of visual features and audio-visual correlations implied by the fused audio-visual features. Experiments on public benchmarks demonstrate that our method can relieve the performance degradation caused by domain discrepancy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes