Developing AI Agents with Simulated Data: Why, what, and how?
This work provides a foundational overview for researchers and practitioners facing data limitations in AI, but it is incremental as it synthesizes existing concepts into a framework.
The chapter addresses the problem of insufficient data volume and quality for subsymbolic AI by introducing simulation-based synthetic data generation, presenting a reference framework for designing digital twin-based AI simulation solutions.
As insufficient data volume and quality remain the key impediments to the adoption of modern subsymbolic AI, techniques of synthetic data generation are in high demand. Simulation offers an apt, systematic approach to generating diverse synthetic data. This chapter introduces the reader to the key concepts, benefits, and challenges of simulation-based synthetic data generation for AI training purposes, and to a reference framework to describe, design, and analyze digital twin-based AI simulation solutions.