CVAILGJun 11, 2025

ScoreMix: Synthetic Data Generation by Score Composition in Diffusion Models Improves Recognition

arXiv:2506.10226v2h-index: 6
Originality Incremental advance
AI Analysis

This addresses the need for domain-specific data augmentation in scenarios with policy or legal constraints, though it is incremental as it builds on existing diffusion model techniques.

The paper tackled the problem of generating synthetic data for recognition tasks without relying on external models or datasets, and found that ScoreMix, a method using score composition in diffusion models, improved accuracy by up to 7 percentage points across benchmarks.

Synthetic data generation is increasingly used in machine learning for training and data augmentation. Yet, current strategies often rely on external foundation models or datasets, whose usage is restricted in many scenarios due to policy or legal constraints. We propose ScoreMix, a self-contained synthetic generation method to produce hard synthetic samples for recognition tasks by leveraging the score compositionality of diffusion models. The approach mixes class-conditioned scores along reverse diffusion trajectories, yielding domain-specific data augmentation without external resources. We systematically study class-selection strategies and find that mixing classes distant in the discriminator's embedding space yields larger gains, providing up to 3% additional average improvement, compared to selection based on proximity. Interestingly, we observe that condition and embedding spaces are largely uncorrelated under standard alignment metrics, and the generator's condition space has a negligible effect on downstream performance. Across 8 public face recognition benchmarks, ScoreMix improves accuracy by up to 7 percentage points, without hyperparameter search, highlighting both robustness and practicality. Our method provides a simple yet effective way to maximize discriminator performance using only the available dataset, without reliance on third-party resources. Paper website: https://parsa-ra.github.io/scoremix/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes