CVMay 4

Edge-Efficient Image Restoration: Transformer Distillation into State-Space Models

arXiv:2605.0279425.6
Predicted impact top 88% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners deploying image restoration on mobile devices, this work offers a practical way to reduce latency without major quality loss, though the gains are incremental for denoising.

The paper proposes a hybrid image restoration framework combining transformer and state-space model blocks, using feature distillation and a multi-objective search to improve runtime efficiency on edge hardware. On a Snapdragon 8 Elite CPU, the method achieves 3.4x, 1.74x, and 1.17x speedups for deblurring, deraining, and denoising respectively, while maintaining competitive quality.

We propose a modular framework for hybrid image restoration that integrates transformer and state-space model (SSM) blocks with a focus on improving runtime efficiency on edge hardware. While transformers provide strong global modeling through self-attention, their attention kernels incur substantial latency on mobile devices, especially for high-resolution inputs. In contrast, SSMs such as Mamba offer lineartime sequence modeling with lower runtime overhead but may underperform on fine grained restoration tasks. To balance accuracy and efficiency, we train lightweight SSM blocks as feature-distilled surrogates of transformer blocks and use them to construct hybrid U-Net-style architectures. To automatically discover effective block combinations, we introduce Efficient Network Search (ENS), a multi-objective search strategy that selects task-specific hybrid configurations from pre-aligned components. ENS optimizes restoration quality while penalizing transformer usage, serving as a lightweight proxy for latency and enabling architecture discovery without repeated hardware profiling. On a Snapdragon 8 Elite CPU, the Restormer baseline requires 10119.52 ms for inference. In contrast, ENS-discovered hybrids significantly reduce runtime: ENS-Deblurring runs in 2973 ms (3.4x faster), ENS-Deraining in 5816 ms (1.74x faster), and ENS-Denoising in 8666 ms (1.17x faster), while maintaining competitive restoration quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes