HCCVJan 20, 2021

SplitSR: An End-to-End Approach to Super-Resolution on Mobile Devices

arXiv:2101.07996v114 citations
Originality Incremental advance
AI Analysis

This enables near real-time super-resolution on mobile devices, addressing a bottleneck for applications like camera apps and mobile health, though it is incremental in improving efficiency for existing deep learning methods.

The paper tackles the problem of deploying super-resolution on mobile devices by introducing SplitSR, a hybrid architecture with a novel lightweight residual block, achieving up to 5 times faster inference and higher accuracy than previous approaches on a low-end ARM CPU.

Super-resolution (SR) is a coveted image processing technique for mobile apps ranging from the basic camera apps to mobile health. Existing SR algorithms rely on deep learning models with significant memory requirements, so they have yet to be deployed on mobile devices and instead operate in the cloud to achieve feasible inference time. This shortcoming prevents existing SR methods from being used in applications that require near real-time latency. In this work, we demonstrate state-of-the-art latency and accuracy for on-device super-resolution using a novel hybrid architecture called SplitSR and a novel lightweight residual block called SplitSRBlock. The SplitSRBlock supports channel-splitting, allowing the residual blocks to retain spatial information while reducing the computation in the channel dimension. SplitSR has a hybrid design consisting of standard convolutional blocks and lightweight residual blocks, allowing people to tune SplitSR for their computational budget. We evaluate our system on a low-end ARM CPU, demonstrating both higher accuracy and up to 5 times faster inference than previous approaches. We then deploy our model onto a smartphone in an app called ZoomSR to demonstrate the first-ever instance of on-device, deep learning-based SR. We conducted a user study with 15 participants to have them assess the perceived quality of images that were post-processed by SplitSR. Relative to bilinear interpolation -- the existing standard for on-device SR -- participants showed a statistically significant preference when looking at both images (Z=-9.270, p<0.01) and text (Z=-6.486, p<0.01).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes