Leveraging Image Complexity in Macro-Level Neural Network Design for Medical Image Segmentation
This work addresses the practical challenge of deploying neural networks on affordable hardware for medical image segmentation, offering a data-driven approach to optimize design trade-offs, though it is incremental as it builds on existing encoder-decoder architectures.
The paper tackles the problem of balancing computational demands and segmentation performance in medical image segmentation by using image complexity to guide macro-level design choices like downsampling and network depth, finding that median frequency is the best complexity measure and that shallow networks on original images outperform deep networks on downsampled ones for high-complexity datasets.
Recent progress in encoder-decoder neural network architecture design has led to significant performance improvements in a wide range of medical image segmentation tasks. However, state-of-the-art networks for a given task may be too computationally demanding to run on affordable hardware, and thus users often resort to practical workarounds by modifying various macro-level design aspects. Two common examples are downsampling of the input images and reducing the network depth to meet computer memory constraints. In this paper we investigate the effects of these changes on segmentation performance and show that image complexity can be used as a guideline in choosing what is best for a given dataset. We consider four statistical measures to quantify image complexity and evaluate their suitability on ten different public datasets. For the purpose of our experiments we also propose two new encoder-decoder architectures representing shallow and deep networks that are more memory efficient than currently popular networks. Our results suggest that median frequency is the best complexity measure in deciding about an acceptable input downsampling factor and network depth. For high-complexity datasets, a shallow network running on the original images may yield better segmentation results than a deep network running on downsampled images, whereas the opposite may be the case for low-complexity images.