CVSep 17, 2025

EDITS: Enhancing Dataset Distillation with Implicit Textual Semantics

arXiv:2509.13858v11 citationsh-index: 3Has Code
Originality Highly original
AI Analysis

This work addresses the limitation of traditional dataset distillation methods that neglect high-level semantics, offering a novel approach for researchers and practitioners in machine learning seeking more efficient training data.

The paper tackles the problem of dataset distillation by incorporating implicit textual semantics from images to improve the quality of synthesized datasets, resulting in enhanced model performance as confirmed by extensive experiments.

Dataset distillation aims to synthesize a compact dataset from the original large-scale one, enabling highly efficient learning while preserving competitive model performance. However, traditional techniques primarily capture low-level visual features, neglecting the high-level semantic and structural information inherent in images. In this paper, we propose EDITS, a novel framework that exploits the implicit textual semantics within the image data to achieve enhanced distillation. First, external texts generated by a Vision Language Model (VLM) are fused with image features through a Global Semantic Query module, forming the prior clustered buffer. Local Semantic Awareness then selects representative samples from the buffer to construct image and text prototypes, with the latter produced by guiding a Large Language Model (LLM) with meticulously crafted prompt. Ultimately, Dual Prototype Guidance strategy generates the final synthetic dataset through a diffusion model. Extensive experiments confirm the effectiveness of our method.Source code is available in: https://github.com/einsteinxia/EDITS.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes