LGAICVNEMar 13, 2024

Data augmentation with automated machine learning: approaches and performance comparison with classical data augmentation methods

arXiv:2403.08352v321 citationsh-index: 15Knowl Inf Syst
AI Analysis

This is an incremental survey that helps practitioners understand automated data augmentation methods across different data modalities.

This survey paper compares automated machine learning (AutoML) approaches for data augmentation against classical methods, finding that AutoML-based techniques currently outperform state-of-the-art conventional approaches.

Data augmentation is arguably the most important regularization technique commonly used to improve generalization performance of machine learning models. It primarily involves the application of appropriate data transformation operations to create new data samples with desired properties. Despite its effectiveness, the process is often challenging because of the time-consuming trial and error procedures for creating and testing different candidate augmentations and their hyperparameters manually. State-of-the-art approaches are increasingly relying on automated machine learning (AutoML) principles. This work presents a comprehensive survey of AutoML-based data augmentation techniques. We discuss various approaches for accomplishing data augmentation with AutoML, including data manipulation, data integration and data synthesis techniques. The focus of this work is on image data augmentation methods. Nonetheless, we cover other data modalities, especially in cases where the specific data augmentations techniques being discussed are more suitable for these other modalities. For instance, since automated data integration methods are more suitable for tabular data, we cover tabular data in the discussion of data integration methods. The work also presents extensive discussion of techniques for accomplishing each of the major subtasks of the image data augmentation process: search space design, hyperparameter optimization and model evaluation. Finally, we carried out an extensive comparison and analysis of the performance of automated data augmentation techniques and state-of-the-art methods based on classical augmentation approaches. The results show that AutoML methods for data augmentation currently outperform state-of-the-art techniques based on conventional approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes