ODIN: On-demand Data Formulation to Mitigate Dataset Lock-in
This addresses the issue of dataset lock-in for AI practitioners, offering a method to learn unseen knowledge beyond training data, though it appears incremental as it builds on existing generative models.
The paper tackles the problem of dataset constraints in zero-shot learning by proposing ODIN, an approach that generates on-demand datasets using generative AI models, and demonstrates its potential through evaluations on model accuracy and data diversity.
ODIN is an innovative approach that addresses the problem of dataset constraints by integrating generative AI models. Traditional zero-shot learning methods are constrained by the training dataset. To fundamentally overcome this limitation, ODIN attempts to mitigate the dataset constraints by generating on-demand datasets based on user requirements. ODIN consists of three main modules: a prompt generator, a text-to-image generator, and an image post-processor. To generate high-quality prompts and images, we adopted a large language model (e.g., ChatGPT), and a text-to-image diffusion model (e.g., Stable Diffusion), respectively. We evaluated ODIN on various datasets in terms of model accuracy and data diversity to demonstrate its potential, and conducted post-experiments for further investigation. Overall, ODIN is a feasible approach that enables Al to learn unseen knowledge beyond the training dataset.