LGOct 13, 2024

TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining

arXiv:2410.10006v11 citationsh-index: 6Trans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This addresses performance degradation in machine learning models when pretraining and target domains differ, offering an incremental improvement for task-adaptive pretraining.

The paper tackles the problem of suboptimal performance in task-adaptive pretraining due to manual tuning of objective tradeoffs, proposing TapWeight to automatically reweight pretraining objectives based on downstream feedback, which significantly surpasses baseline methods on molecular property prediction and natural language understanding tasks.

Large-scale general domain pretraining followed by downstream-specific finetuning has become a predominant paradigm in machine learning. However, discrepancies between the pretraining and target domains can still lead to performance degradation in certain cases, underscoring the need for task-adaptive continued pretraining (TAP). TAP methods typically involve continued pretraining on task-specific unlabeled datasets or introducing additional unsupervised learning objectives to enhance model capabilities. While many TAP methods perform continued pretraining with multiple pretraining objectives, they often determine the tradeoff parameters between objectives manually, resulting in suboptimal outcomes and higher computational costs. In this paper, we propose TapWeight, a task-adaptive pretraining framework which automatically determines the optimal importance of each pretraining objective based on downstream feedback. TapWeight reweights each pretraining objective by solving a multi-level optimization problem. We applied TapWeight to both molecular property prediction and natural language understanding tasks, significantly surpassing baseline methods. Experimental results validate the effectiveness and generalizability of TapWeight.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes