LGNAMLJul 8, 2024

Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

arXiv:2407.06120v39 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses data selection for finetuning in machine learning, offering a provably efficient method that balances variance and bias, though it is incremental as it builds on classical variance minimization.

The paper tackles the problem of data selection for finetuning by introducing Sketchy Moment Matching (SkMM), a scalable two-stage method that controls bias through gradient sketching and reduces variance via moment matching, achieving a fast-rate generalization of O(dim(S)/n) independent of parameter dimension.

We revisit data selection in a modern context of finetuning from a fundamental perspective. Extending the classical wisdom of variance minimization in low dimensions to high-dimensional finetuning, our generalization analysis unveils the importance of additionally reducing bias induced by low-rank approximation. Inspired by the variance-bias tradeoff in high dimensions from the theory, we introduce Sketchy Moment Matching (SkMM), a scalable data selection scheme with two stages. (i) First, the bias is controlled using gradient sketching that explores the finetuning parameter space for an informative low-dimensional subspace $\mathcal{S}$; (ii) then the variance is reduced over $\mathcal{S}$ via moment matching between the original and selected datasets. Theoretically, we show that gradient sketching is fast and provably accurate: selecting $n$ samples by reducing variance over $\mathcal{S}$ preserves the fast-rate generalization $O(\dim(\mathcal{S})/n)$, independent of the parameter dimension. Empirically, we concretize the variance-bias balance via synthetic experiments and demonstrate the effectiveness of SkMM for finetuning in real vision tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes