Knowledge transfer in deep block-modular neural networks
This addresses the inefficiency of training deep neural networks from scratch for each task, offering a more reusable and efficient solution for machine learning practitioners.
The paper tackles the problem of deep neural networks being specialized and requiring training from scratch for each new task by proposing a block-modular architecture that reuses parts of existing networks. The result shows that this approach can outperform or match networks trained from scratch while learning nearly 10 times fewer weights.
Although deep neural networks (DNNs) have demonstrated impressive results during the last decade, they remain highly specialized tools, which are trained -- often from scratch -- to solve each particular task. The human brain, in contrast, significantly re-uses existing capacities when learning to solve new tasks. In the current study we explore a block-modular architecture for DNNs, which allows parts of the existing network to be re-used to solve a new task without a decrease in performance when solving the original task. We show that networks with such architectures can outperform networks trained from scratch, or perform comparably, while having to learn nearly 10 times fewer weights than the networks trained from scratch.