Shift is Good: Mismatched Data Mixing Improves Test Performance
This work addresses the challenge of optimizing data mixing strategies for improved generalization in machine learning, offering insights for practitioners dealing with distribution shifts.
The paper tackles the problem of training and testing on mixture distributions with different proportions, showing that distribution shift can improve test performance even without transfer between components. It identifies optimal training proportions and quantifies the benefits across various scenarios.
We consider training and testing on mixture distributions with different training and test proportions. We show that in many settings, and in some sense generically, distribution shift can be beneficial, and test performance can improve due to mismatched training proportions, even if the components are unrelated and with no transfer between components. In a variety of scenarios, we identify the optimal training proportions and the extent to which such distribution shift can be beneficial. We show how the same analysis applies also to a compositional setting with differing distribution of component "skills'' at training and test.