On the Difficulty of Learning a Meta-network for Training Data Selection
For practitioners using synthetic data to train neural networks, this work improves data selection methods by addressing optimization and feature issues.
The paper identifies two obstacles in meta-learning for training-data selection (MTS): poor gradient signal-to-noise ratio and lack of informative features. It proposes increasing batch size and using new features, achieving average gains of 5.49% over no selection and 2.89% over the strongest baseline.
Synthetic data are increasingly used to train neural networks, yet distributional mismatch with real data limits their effectiveness when used indiscriminately. A common strategy is to learn data weights via bi-level optimization, which we refer to as Meta-learning for Training-data Selection (MTS). Interestingly, in practice, MTS often performs below expectation. We identify two obstacles in properly training MTS: a poor gradient signal-to-noise ratio (GSNR), which causes optimization difficulties, and lack of informative features that correlates with data quality. We present a mathematical analysis of MTS, which reveals the dynamics of normalized data weights and the relation between disparate data quality and poor GSNR. The analysis suggests a a simple yet effective solution: increasing the batch size. Further, we propose a set of informative features that capture the positions of training data in their distributions and training dynamics. Experiments across four benchmarks show consistent improvements, achieving average gains of 5.49% over training without selection and 2.89% over the strongest baseline.