GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language
This work addresses the need for end users to understand distribution shifts to improve AI deployment, though it appears incremental as it builds on existing techniques for monitoring shifts.
The paper tackles the problem of explaining dataset-level distribution shifts between two image datasets in natural language, proposing GSCLIP, a training-free framework that combines a hybrid generator group and an efficient selector, and verifies its effectiveness on natural data shifts.
Helping end users comprehend the abstract distribution shifts can greatly facilitate AI deployment. Motivated by this, we propose a novel task, dataset explanation. Given two image data sets, dataset explanation aims to automatically point out their dataset-level distribution shifts with natural language. Current techniques for monitoring distribution shifts provide inadequate information to understand datasets with the goal of improving data quality. Therefore, we introduce GSCLIP, a training-free framework to solve the dataset explanation task. In GSCLIP, we propose the selector as the first quantitative evaluation method to identify explanations that are proper to summarize dataset shifts. Furthermore, we leverage this selector to demonstrate the superiority of a generator based on language model generation. Systematic evaluation on natural data shift verifies that GSCLIP, a combined system of a hybrid generator group and an efficient selector is not only easy-to-use but also powerful for dataset explanation at scale.