Control+Shift: Generating Controllable Distribution Shifts
This work addresses the challenge of assessing AI model robustness for researchers and practitioners, though it is incremental as it builds on existing generative models for dataset creation.
The paper tackles the problem of evaluating model robustness to distribution shifts by proposing a method to generate realistic datasets with varying shift intensities, and finds that model performance consistently degrades with increasing shift intensity, even when imperceptible to humans or with data augmentation.
We propose a new method for generating realistic datasets with distribution shifts using any decoder-based generative model. Our approach systematically creates datasets with varying intensities of distribution shifts, facilitating a comprehensive analysis of model performance degradation. We then use these generated datasets to evaluate the performance of various commonly used networks and observe a consistent decline in performance with increasing shift intensity, even when the effect is almost perceptually unnoticeable to the human eye. We see this degradation even when using data augmentations. We also find that enlarging the training dataset beyond a certain point has no effect on the robustness and that stronger inductive biases increase robustness.