Band-Attention Modulated RetNet for Face Forgery Detection
This work addresses face forgery detection, a domain-specific problem with potential applications in security and media verification, but it appears incremental as it builds on existing transformer architectures with modifications for efficiency.
The paper tackles the challenge of balancing global context capture with computational complexity in transformer-based face forgery detection by introducing BAR-Net, a lightweight network that achieves favorable performance and outperforms current state-of-the-art methods on several datasets.
The transformer networks are extensively utilized in face forgery detection due to their scalability across large datasets.Despite their success, transformers face challenges in balancing the capture of global context, which is crucial for unveiling forgery clues, with computational complexity.To mitigate this issue, we introduce Band-Attention modulated RetNet (BAR-Net), a lightweight network designed to efficiently process extensive visual contexts while avoiding catastrophic forgetting.Our approach empowers the target token to perceive global information by assigning differential attention levels to tokens at varying distances. We implement self-attention along both spatial axes, thereby maintaining spatial priors and easing the computational burden.Moreover, we present the adaptive frequency Band-Attention Modulation mechanism, which treats the entire Discrete Cosine Transform spectrogram as a series of frequency bands with learnable weights.Together, BAR-Net achieves favorable performance on several face forgery datasets, outperforming current state-of-the-art methods.