LG AIMay 11

LiBaGS: Lightweight Boundary Gap Synthesis for Targeted Synthetic Data Selection

Abhishek Moturu, Anna Goldenberg, Babak Taati

arXiv:2605.1123151.2

Predicted impact top 49% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners using synthetic data augmentation, LiBaGS provides a generator-agnostic selection method that improves model accuracy by targeting informative and realistic samples.

LiBaGS introduces a lightweight method for selecting synthetic training data that fills missing parts of the training distribution relevant to the downstream task, improving accuracy over classical oversampling and other selection criteria.

Synthetic data is useful only when the added samples fill missing parts of the training distribution that matter for the downstream task. We introduce LiBaGS, a lightweight, generator-agnostic method for targeted synthetic training data selection. LiBaGS scores candidate synthetic samples by combining decision-boundary proximity, predictive uncertainty, real-data density, and support validity, so that selected samples are both informative and likely to remain on the real data manifold. We then use a boundary-gap allocation rule that targets sparse but realistic decision-boundary neighborhoods, rather than simply adding more data or selecting only the most uncertain candidates. LiBaGS also learns when enough synthetic samples have been added through a marginal-value stopping rule, assigns softer labels near ambiguous boundaries, and uses a diversity objective to avoid redundant near-duplicate selections. Experiments show that LiBaGS improves accuracy over classical oversampling, hard augmentation, uncertainty and density ablations, and targeted-generation selection criteria.

View on arXiv PDF

Similar