LG AI SEJun 15, 2023

Modularizing while Training: A New Paradigm for Modularizing DNN Models

Binhang Qi, Hailong Sun, Hongyu Zhang, Ruobing Zhao, Xiang Gao

arXiv:2306.09376v33.86 citationsh-index: 47Has Code

Originality Highly original

AI Analysis

This addresses the issue of efficient DNN reuse for developers and researchers, offering a novel paradigm that is not incremental but introduces a new training approach.

The paper tackles the problem of expensive DNN training and reuse by proposing modularizing-while-training (MwT), which incorporates modularization into training to reduce overhead and accuracy loss, resulting in only 1.13 percentage points accuracy loss, 74.31% reduction in kernel retention rate, and 108 minutes total time cost.

Deep neural network (DNN) models have become increasingly crucial components in intelligent software systems. However, training a DNN model is typically expensive in terms of both time and money. To address this issue, researchers have recently focused on reusing existing DNN models - borrowing the idea of code reuse in software engineering. However, reusing an entire model could cause extra overhead or inherits the weakness from the undesired functionalities. Hence, existing work proposes to decompose an already trained model into modules, i.e., modularizing-after-training, and enable module reuse. Since trained models are not built for modularization, modularizing-after-training incurs huge overhead and model accuracy loss. In this paper, we propose a novel approach that incorporates modularization into the model training process, i.e., modularizing-while-training (MwT). We train a model to be structurally modular through two loss functions that optimize intra-module cohesion and inter-module coupling. We have implemented the proposed approach for modularizing Convolutional Neural Network (CNN) models in this work. The evaluation results on representative models demonstrate that MwT outperforms the state-of-the-art approach. Specifically, the accuracy loss caused by MwT is only 1.13 percentage points, which is 1.76 percentage points less than that of the baseline. The kernel retention rate of the modules generated by MwT is only 14.58%, with a reduction of 74.31% over the state-of-the-art approach. Furthermore, the total time cost required for training and modularizing is only 108 minutes, half of the baseline.

View on arXiv PDF Code

Similar