ML AI CL LG NEMay 23, 2017

Latent Multi-task Architecture Learning

Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard

arXiv:1705.08142v331.8176 citationsHas Code

Originality Incremental advance

AI Analysis

This work improves multi-task learning efficiency for practitioners by automating architecture design, though it is incremental as it builds on prior isolated solutions.

The paper tackles the problem of efficiently searching for optimal parameter-sharing architectures in multi-task learning by jointly addressing layer selection, sharing amount, and loss weighting, achieving up to 15% average error reduction over common approaches.

Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses. Recent work has addressed each of the above problems in isolation. In this work we present an approach that learns a latent multi-task architecture that jointly addresses (a)--(c). We present experiments on synthetic data and data from OntoNotes 5.0, including four different tasks and seven different domains. Our extension consistently outperforms previous approaches to learning latent architectures for multi-task problems and achieves up to 15% average error reductions over common approaches to MTL.

View on arXiv PDF Code

Similar