CVSep 17, 2025

EDITS: Enhancing Dataset Distillation with Implicit Textual Semantics

Qianxin Xia, Jiawei Du, Guoming Lu, Zhiyong Shu, Jielei Wang

arXiv:2509.13858v11 citationsh-index: 3Has Code

Originality Highly original

AI Analysis

This work addresses the limitation of traditional dataset distillation methods that neglect high-level semantics, offering a novel approach for researchers and practitioners in machine learning seeking more efficient training data.

The paper tackles the problem of dataset distillation by incorporating implicit textual semantics from images to improve the quality of synthesized datasets, resulting in enhanced model performance as confirmed by extensive experiments.

Dataset distillation aims to synthesize a compact dataset from the original large-scale one, enabling highly efficient learning while preserving competitive model performance. However, traditional techniques primarily capture low-level visual features, neglecting the high-level semantic and structural information inherent in images. In this paper, we propose EDITS, a novel framework that exploits the implicit textual semantics within the image data to achieve enhanced distillation. First, external texts generated by a Vision Language Model (VLM) are fused with image features through a Global Semantic Query module, forming the prior clustered buffer. Local Semantic Awareness then selects representative samples from the buffer to construct image and text prototypes, with the latter produced by guiding a Large Language Model (LLM) with meticulously crafted prompt. Ultimately, Dual Prototype Guidance strategy generates the final synthetic dataset through a diffusion model. Extensive experiments confirm the effectiveness of our method.Source code is available in: https://github.com/einsteinxia/EDITS.

View on arXiv PDF Code

Similar