CVAILGMay 28, 2025

DAM: Domain-Aware Module for Multi-Domain Dataset Condensation

arXiv:2505.22387v13 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the computational and storage burdens in deep learning for multi-domain datasets, representing an incremental advancement in dataset condensation.

The paper tackles the problem of dataset condensation for multi-domain datasets by introducing a Domain-Aware Module (DAM) that embeds domain features into synthetic images, resulting in improved in-domain, out-of-domain, and cross-architecture performance over baseline methods.

Dataset Condensation (DC) has emerged as a promising solution to mitigate the computational and storage burdens associated with training deep learning models. However, existing DC methods largely overlook the multi-domain nature of modern datasets, which are increasingly composed of heterogeneous images spanning multiple domains. In this paper, we extend DC and introduce Multi-Domain Dataset Condensation (MDDC), which aims to condense data that generalizes across both single-domain and multi-domain settings. To this end, we propose the Domain-Aware Module (DAM), a training-time module that embeds domain-related features into each synthetic image via learnable spatial masks. As explicit domain labels are mostly unavailable in real-world datasets, we employ frequency-based pseudo-domain labeling, which leverages low-frequency amplitude statistics. DAM is only active during the condensation process, thus preserving the same images per class (IPC) with prior methods. Experiments show that DAM consistently improves in-domain, out-of-domain, and cross-architecture performance over baseline dataset condensation methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes