MS-SCANet: A Multiscale Transformer-Based Architecture with Dual Attention for No-Reference Image Quality Assessment

Mayesha Maliha R. Mithila, Mylene C. Q. Farias

arXiv:2602.04032v17.95 citationsICASSP

Originality Incremental advance

AI Analysis

This work addresses image quality assessment for applications like photography and video streaming, but it is incremental, building on existing transformer and attention methods.

The authors tackled no-reference image quality assessment by proposing MS-SCANet, a transformer-based architecture with multiscale processing and dual attention, which achieved state-of-the-art performance on datasets like KonIQ-10k and LIVE with stronger correlations to human scores.

We present the Multi-Scale Spatial Channel Attention Network (MS-SCANet), a transformer-based architecture designed for no-reference image quality assessment (IQA). MS-SCANet features a dual-branch structure that processes images at multiple scales, effectively capturing both fine and coarse details, an improvement over traditional single-scale methods. By integrating tailored spatial and channel attention mechanisms, our model emphasizes essential features while minimizing computational complexity. A key component of MS-SCANet is its cross-branch attention mechanism, which enhances the integration of features across different scales, addressing limitations in previous approaches. We also introduce two new consistency loss functions, Cross-Branch Consistency Loss and Adaptive Pooling Consistency Loss, which maintain spatial integrity during feature scaling, outperforming conventional linear and bilinear techniques. Extensive evaluations on datasets like KonIQ-10k, LIVE, LIVE Challenge, and CSIQ show that MS-SCANet consistently surpasses state-of-the-art methods, offering a robust framework with stronger correlations with subjective human scores.

View on arXiv PDF

Similar