RAWMamba: Unified sRGB-to-RAW De-rendering With State Space Model
This addresses the deployment complexity in image and video processing for photographers and developers by providing a unified solution, though it appears incremental as it builds on existing metadata-driven approaches.
The paper tackles the problem of reconstructing RAW data from sRGB images and videos by proposing RAWMamba, a unified framework that harmonizes metadata requirements across domains, achieving state-of-the-art performance in RAW data reconstruction.
Recent advancements in sRGB-to-RAW de-rendering have increasingly emphasized metadata-driven approaches to reconstruct RAW data from sRGB images, supplemented by partial RAW information. In image-based de-rendering, metadata is commonly obtained through sampling, whereas in video tasks, it is typically derived from the initial frame. The distinct metadata requirements necessitate specialized network architectures, leading to architectural incompatibilities that increase deployment complexity. In this paper, we propose RAWMamba, a Mamba-based unified framework developed for sRGB-to-RAW de-rendering across both image and video domains. The core of RAWMamba is the Unified Metadata Embedding (UME) module, which harmonizes diverse metadata types into a unified representation. In detail, a multi-perspective affinity modeling method is proposed to promote the extraction of reference information. In addition, we introduce the Local Tone-Aware Mamba (LTA-Mamba) module, which captures long-range dependencies to enable effective global propagation of metadata. Experimental results demonstrate that the proposed RAWMamba achieves state-of-the-art performance, yielding high-quality RAW data reconstruction.