LG CVJul 23, 2021

Compositional Models: Multi-Task Learning and Knowledge Transfer with Modular Networks

Andrey Zhmoginov, Dina Bashkirova, Mark Sandler

arXiv:2107.10963v15.53 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient knowledge transfer and modularity in neural networks for researchers and practitioners in machine learning, though it is incremental as it builds on existing modular and ResNet concepts.

The paper tackles the problem of multi-task learning and knowledge transfer by proposing a modular network approach based on isometric ResNet blocks, enabling reusable computational blocks and adjustable computation order. It achieves competitive results in multi-task learning, transfer learning, and domain adaptation, with applications like improving ImageNet accuracy without parameter increase.

Conditional computation and modular networks have been recently proposed for multitask learning and other problems as a way to decompose problem solving into multiple reusable computational blocks. We propose a new approach for learning modular networks based on the isometric version of ResNet with all residual blocks having the same configuration and the same number of parameters. This architectural choice allows adding, removing and changing the order of residual blocks. In our method, the modules can be invoked repeatedly and allow knowledge transfer to novel tasks by adjusting the order of computation. This allows soft weight sharing between tasks with only a small increase in the number of parameters. We show that our method leads to interpretable self-organization of modules in case of multi-task learning, transfer learning and domain adaptation while achieving competitive results on those tasks. From practical perspective, our approach allows to: (a) reuse existing modules for learning new task by adjusting the computation order, (b) use it for unsupervised multi-source domain adaptation to illustrate that adaptation to unseen data can be achieved by only manipulating the order of pretrained modules, (c) show how our approach can be used to increase accuracy of existing architectures for image classification tasks such as ImageNet, without any parameter increase, by reusing the same block multiple times.

View on arXiv PDF

Similar