Contrastive Learning-Enhanced Trajectory Matching for Small-Scale Dataset Distillation
This work addresses the challenge of deploying machine learning models in edge devices or rapid prototyping scenarios by improving dataset distillation for small-scale synthetic data, representing an incremental advancement over existing techniques.
The paper tackled the problem of dataset distillation for resource-constrained environments by integrating contrastive learning into Trajectory Matching methods to preserve semantic richness under extreme sample scarcity, resulting in notable performance improvements and enhanced visual fidelity in synthetic datasets.
Deploying machine learning models in resource-constrained environments, such as edge devices or rapid prototyping scenarios, increasingly demands distillation of large datasets into significantly smaller yet informative synthetic datasets. Current dataset distillation techniques, particularly Trajectory Matching methods, optimize synthetic data so that the model's training trajectory on synthetic samples mirrors that on real data. While demonstrating efficacy on medium-scale synthetic datasets, these methods fail to adequately preserve semantic richness under extreme sample scarcity. To address this limitation, we propose a novel dataset distillation method integrating contrastive learning during image synthesis. By explicitly maximizing instance-level feature discrimination, our approach produces more informative and diverse synthetic samples, even when dataset sizes are significantly constrained. Experimental results demonstrate that incorporating contrastive learning substantially enhances the performance of models trained on very small-scale synthetic datasets. This integration not only guides more effective feature representation but also significantly improves the visual fidelity of the synthesized images. Experimental results demonstrate that our method achieves notable performance improvements over existing distillation techniques, especially in scenarios with extremely limited synthetic data.