LGFeb 7, 2022

Adaptive Mixing of Auxiliary Losses in Supervised Learning

arXiv:2202.03250v37 citations
AI Analysis

This addresses the challenge of effectively integrating multiple loss functions in supervised learning for domains like knowledge distillation and rule-denoising, though it is incremental as it builds on existing auxiliary loss methods.

The paper tackles the problem of learning to combine auxiliary losses in supervised learning, such as knowledge distillation and rule-denoising, by proposing AMAL, which uses bi-level optimization to learn instance-level mixing weights, resulting in noticeable gains over competitive baselines in experiments.

In several supervised learning scenarios, auxiliary losses are used in order to introduce additional information or constraints into the supervised learning objective. For instance, knowledge distillation aims to mimic outputs of a powerful teacher model; similarly, in rule-based approaches, weak labeling information is provided by labeling functions which may be noisy rule-based approximations to true labels. We tackle the problem of learning to combine these losses in a principled manner. Our proposal, AMAL, uses a bi-level optimization criterion on validation data to learn optimal mixing weights, at an instance level, over the training data. We describe a meta-learning approach towards solving this bi-level objective and show how it can be applied to different scenarios in supervised learning. Experiments in a number of knowledge distillation and rule-denoising domains show that AMAL provides noticeable gains over competitive baselines in those domains. We empirically analyze our method and share insights into the mechanisms through which it provides performance gains.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes