CV AIMar 19

CAFlow: Adaptive-Depth Single-Step Flow Matching for Efficient Histopathology Super-Resolution

Elad Yoshai, Ariel D. Yoshai, Natan T. Shaked

arXiv:2603.1851316.2h-index: 43

AI Analysis

This addresses the problem of deploying high-resolution super-resolution in digital pathology for medical professionals, offering a computationally efficient solution with incremental improvements in speed and quality.

The paper tackles the computational inefficiency of generative super-resolution for gigapixel histopathology images by introducing CAFlow, an adaptive-depth single-step flow-matching framework that routes image tiles to the shallowest network exit while preserving quality, achieving 31.72 dB PSNR with 33% compute savings and reducing whole-slide inference from minutes to seconds.

In digital pathology, whole-slide images routinely exceed gigapixel resolution, making computationally intensive generative super-resolution (SR) impractical for routine deployment. We introduce CAFlow, an adaptive-depth single-step flow-matching framework that routes each image tile to the shallowest network exit that preserves reconstruction quality. CAFlow performs flow matching in pixel-unshuffled rearranged space, reducing spatial computation by 16x while enabling direct inference. We show that dedicating half of training to exact t=0 samples is essential for single-step quality (-1.5 dB without it). The backbone, FlowResNet (1.90M parameters), mixes convolution and window self-attention blocks across four early exits spanning 3.1 to 13.3 GFLOPs. A lightweight exit classifier (~6K parameters) achieves 33% compute savings at only 0.12 dB cost. On multi-organ histopathology x4 SR, adaptive routing achieves 31.72 dB PSNR versus 31.84 dB at full depth, while the shallowest exit exceeds bicubic by +1.9 dB at 2.8x less compute than SwinIR-light. The method generalizes to held-out colon tissue with minimal quality loss (-0.02 dB), and at x8 upscaling it outperforms all comparable-compute baselines while remaining competitive with the much larger SwinIR-Medium model. Downstream nuclei segmentation confirms preservation of clinically relevant structure. The model trains in under 5 hours on a single GPU, and adaptive routing can reduce whole-slide inference from minutes to seconds.

View on arXiv PDF

Similar