LGAIMar 13, 2023

ODIN: On-demand Data Formulation to Mitigate Dataset Lock-in

arXiv:2303.06832v2h-index: 6
Originality Incremental advance
AI Analysis

This addresses the issue of dataset lock-in for AI practitioners, offering a method to learn unseen knowledge beyond training data, though it appears incremental as it builds on existing generative models.

The paper tackles the problem of dataset constraints in zero-shot learning by proposing ODIN, an approach that generates on-demand datasets using generative AI models, and demonstrates its potential through evaluations on model accuracy and data diversity.

ODIN is an innovative approach that addresses the problem of dataset constraints by integrating generative AI models. Traditional zero-shot learning methods are constrained by the training dataset. To fundamentally overcome this limitation, ODIN attempts to mitigate the dataset constraints by generating on-demand datasets based on user requirements. ODIN consists of three main modules: a prompt generator, a text-to-image generator, and an image post-processor. To generate high-quality prompts and images, we adopted a large language model (e.g., ChatGPT), and a text-to-image diffusion model (e.g., Stable Diffusion), respectively. We evaluated ODIN on various datasets in terms of model accuracy and data diversity to demonstrate its potential, and conducted post-experiments for further investigation. Overall, ODIN is a feasible approach that enables Al to learn unseen knowledge beyond the training dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes