Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning
This work addresses the problem of improving generalization and identifiability in multi-task learning for researchers and practitioners, though it appears incremental by building on existing disentanglement and sparsity concepts.
The paper tackles the limited understanding of disentangled representations' benefits by showing that combining them with sparse base-predictors improves generalization in multi-task learning, achieving competitive results on few-shot classification benchmarks with each task using only a fraction of the representations.
Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding is limited. In this work, we provide evidence that disentangled representations coupled with sparse base-predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem. Finally, we explore a meta-learning version of this algorithm based on group Lasso multiclass SVM base-predictors, for which we derive a tractable dual formulation. It obtains competitive results on standard few-shot classification benchmarks, while each task is using only a fraction of the learned representations.