CVMar 25, 2022

Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks

Jiacong Hu, Jing Gao, Jingwen Ye, Yang Gao, Xingen Wang, Zunlei Feng, Mingli Song

arXiv:2203.13453v21.45 citationsh-index: 71Has Code

Originality Highly original

AI Analysis

This addresses the issue of high computational costs for researchers and practitioners in deep learning, offering a novel approach for model reuse and creation.

The paper tackles the problem of resource-intensive training of new deep learning models by proposing Model Disassembling and Assembling (MDA), a paradigm that extracts task-aware components from trained CNN classifiers and assembles them into new models without training, achieving performance that matches or surpasses baselines in experiments.

With the rapid development of deep learning, the increasing complexity and scale of parameters make training a new model increasingly resource-intensive. In this paper, we start from the classic convolutional neural network (CNN) and explore a paradigm that does not require training to obtain new models. Similar to the birth of CNN inspired by receptive fields in the biological visual system, we draw inspiration from the information subsystem pathways in the biological visual system and propose Model Disassembling and Assembling (MDA). During model disassembling, we introduce the concept of relative contribution and propose a component locating technique to extract task-aware components from trained CNN classifiers. For model assembling, we present the alignment padding strategy and parameter scaling strategy to construct a new model tailored for a specific task, utilizing the disassembled task-aware components. The entire process is akin to playing with LEGO bricks, enabling arbitrary assembly of new models, and providing a novel perspective for model creation and reuse. Extensive experiments showcase that task-aware components disassembled from CNN classifiers or new models assembled using these components closely match or even surpass the performance of the baseline, demonstrating its promising results for model reuse. Furthermore, MDA exhibits diverse potential applications, with comprehensive experiments exploring model decision route analysis, model compression, knowledge distillation, and more. The code is available at https://github.com/jiaconghu/Model-LEGO.

View on arXiv PDF Code

Similar