Operating critical machine learning models in resource constrained regimes
This addresses the problem of deploying deep learning models in clinics globally, where high resource costs are a barrier, but it is incremental as it builds on existing efficiency methods like quantization.
The paper investigates the trade-off between resource consumption and performance for machine learning models in critical settings like clinics, aiming to enable deployment in resource-constrained environments.
The accelerated development of machine learning methods, primarily deep learning, are causal to the recent breakthroughs in medical image analysis and computer aided intervention. The resource consumption of deep learning models in terms of amount of training data, compute and energy costs are known to be massive. These large resource costs can be barriers in deploying these models in clinics, globally. To address this, there are cogent efforts within the machine learning community to introduce notions of resource efficiency. For instance, using quantisation to alleviate memory consumption. While most of these methods are shown to reduce the resource utilisation, they could come at a cost in performance. In this work, we probe into the trade-off between resource consumption and performance, specifically, when dealing with models that are used in critical settings such as in clinics.