Lightweight Quad Bayer HybridEVS Demosaicing via State Space Augmented Cross-Attention
This addresses efficient image demosaicing for mobile devices using event cameras, though it appears incremental as it builds on existing demosaicing methods with specific architectural improvements.
The paper tackles demosaicing for Quad Bayer HybridEVS cameras, which combine event pixels with color sensors, by introducing TSANet, a lightweight two-stage network that handles event pixel inpainting and demosaicing separately; it outperforms the previous state-of-the-art DemosaicFormer across seven datasets in PSNR and SSIM while reducing parameters by 1.86× and computation by 3.29×.
Event cameras like the Hybrid Event-based Vision Sensor (HybridEVS) camera capture brightness changes as asynchronous "events" instead of frames, offering advanced application on mobile photography. However, challenges arise from combining a Quad Bayer Color Filter Array (CFA) sensor with event pixels lacking color information, resulting in aliasing and artifacts on the demosaicing process before downstream application. Current methods struggle to address these issues, especially on resource-limited mobile devices. In response, we introduce \textbf{TSANet}, a lightweight \textbf{T}wo-stage network via \textbf{S}tate space augmented cross-\textbf{A}ttention, which can handle event pixels inpainting and demosaicing separately, leveraging the benefits of dividing complex tasks into manageable subtasks. Furthermore, we introduce a lightweight Cross-Swin State Block that uniquely utilizes positional prior for demosaicing and enhances global dependencies through the state space model with linear complexity. In summary, TSANet demonstrates excellent demosaicing performance on both simulated and real data of HybridEVS while maintaining a lightweight model, averaging better results than the previous state-of-the-art method DemosaicFormer across seven diverse datasets in both PSNR and SSIM, while respectively reducing parameter and computation costs by $1.86\times$ and $3.29\times$. Our approach presents new possibilities for efficient image demosaicing on mobile devices. Code is available in the supplementary materials.