AS AI SD SPApr 4, 2025

Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification

Francesca Ronchini, Ho-Hsiang Wu, Wei-Cheng Lin, Fabio Antonacci

arXiv:2504.03329v12.31 citationsh-index: 62025 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)

Originality Incremental advance

AI Analysis

This work addresses data augmentation for sound classification tasks, but it is incremental as it builds on existing TTA models and prompt techniques.

The paper tackled improving sound classification by designing effective prompt strategies for generating realistic datasets with Text-To-Audio models, finding that task-specific prompts outperform basic ones and merging datasets from different models enhances classification more than just increasing dataset size.

This paper investigates the design of effective prompt strategies for generating realistic datasets using Text-To-Audio (TTA) models. We also analyze different techniques for efficiently combining these datasets to enhance their utility in sound classification tasks. By evaluating two sound classification datasets with two TTA models, we apply a range of prompt strategies. Our findings reveal that task-specific prompt strategies significantly outperform basic prompt approaches in data generation. Furthermore, merging datasets generated using different TTA models proves to enhance classification performance more effectively than merely increasing the training dataset size. Overall, our results underscore the advantages of these methods as effective data augmentation techniques using synthetic data.

View on arXiv PDF

Similar