LGSep 3, 2024

Synthetic Data Generation and Automated Multidimensional Data Labeling for AI/ML in General and Circular Coordinates

arXiv:2409.02079v11 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This addresses data scarcity for AI/ML developers, but appears incremental as it builds on existing visualization techniques.

The paper tackles the challenge of insufficient training data for AI/ML models by proposing a unified algorithm for synthetic data generation and automated labeling, using multidimensional visualizations in General Line Coordinates, and demonstrates results with real data in case studies evaluating classifier impact.

Insufficient amounts of available training data is a critical challenge for both development and deployment of artificial intelligence and machine learning (AI/ML) models. This paper proposes a unified approach to both synthetic data generation (SDG) and automated data labeling (ADL) with a unified SDG-ADL algorithm. SDG-ADL uses multidimensional (n-D) representations of data visualized losslessly with General Line Coordinates (GLCs), relying on reversible GLC properties to visualize n-D data in multiple GLCs. This paper demonstrates use of the new Circular Coordinates in Static and Dynamic forms, used with Parallel Coordinates and Shifted Paired Coordinates, since each GLC exemplifies unique data properties, such as interattribute n-D distributions and outlier detection. The approach is interactively implemented in computer software with the Dynamic Coordinates Visualization system (DCVis). Results with real data are demonstrated in case studies, evaluating impact on classifiers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes