Progressive Transmission and Inference of Deep Learning Models
This addresses the user experience issue for mobile or web users with slow network connections by enabling faster approximate outputs, though it is incremental as it adapts existing progressive techniques from image files to models.
The paper tackles the problem of slow transmission of deep learning models from servers to user devices by proposing a progressive transmission framework that allows approximate inference during delivery, showing computational efficiency without increasing model size or total transmission time while preserving accuracy.
Modern image files are usually progressively transmitted and provide a preview before downloading the entire image for improved user experience to cope with a slow network connection. In this paper, with a similar goal, we propose a progressive transmission framework for deep learning models, especially to deal with the scenario where pre-trained deep learning models are transmitted from servers and executed at user devices (e.g., web browser or mobile). Our progressive transmission allows inferring approximate models in the middle of file delivery, and quickly provide an acceptable intermediate outputs. On the server-side, a deep learning model is divided and progressively transmitted to the user devices. Then, the divided pieces are progressively concatenated to construct approximate models on user devices. Experiments show that our method is computationally efficient without increasing the model size and total transmission time while preserving the model accuracy. We further demonstrate that our method can improve the user experience by providing the approximate models especially in a slow connection.