Combining Cloud and Mobile Computing for Machine Learning
This addresses the challenge of limited memory and battery life on mobile devices for machine learning applications, offering an incremental improvement over existing cloud-only or mobile-only approaches.
The paper tackles the problem of running large machine learning models on mobile devices by proposing model segmentation to distribute computation between mobile devices and the cloud, reducing user wait time and optimizing cloud workloads.
Although the computing power of mobile devices is increasing, machine learning models are also growing in size. This trend creates problems for mobile devices due to limitations like their memory capacity and battery life. While many services, like ChatGPT and Midjourney, run all the inferences in the cloud, we believe a flexible and fine-grained task distribution is more desirable. In this work, we consider model segmentation as a solution to improving the user experience, dividing the computation between mobile devices and the cloud in a way that offloads the compute-heavy portion of the model while minimizing the data transfer required. We show that the division not only reduces the wait time for users but can also be fine-tuned to optimize the workloads of the cloud. To achieve that, we design a scheduler that collects information about network quality, client device capability, and job requirements, making decisions to achieve consistent performance across a range of devices while reducing the work the cloud needs to perform.