SD AI ASSep 29, 2024

PALM: Few-Shot Prompt Learning for Audio Language Models

Asif Hanif, Maha Tufail Agro, Mohammad Areeb Qazi, Hanan Aldarmaki

arXiv:2409.19806v124.825 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of improving few-shot learning efficiency for audio recognition tasks, though it is incremental as it adapts existing prompt learning techniques from vision to audio.

The paper tackled the sensitivity of zero-shot audio recognition in Audio-Language Models to hand-crafted text prompts by proposing PALM, a method that optimizes the text encoder feature space, achieving performance on par with or better than baselines on 11 audio datasets while being computationally efficient.

Audio-Language Models (ALMs) have recently achieved remarkable success in zero-shot audio recognition tasks, which match features of audio waveforms with class-specific text prompt features, inspired by advancements in Vision-Language Models (VLMs). Given the sensitivity of zero-shot performance to the choice of hand-crafted text prompts, many prompt learning techniques have been developed for VLMs. We explore the efficacy of these approaches in ALMs and propose a novel method, Prompt Learning in Audio Language Models (PALM), which optimizes the feature space of the text encoder branch. Unlike existing methods that work in the input space, our approach results in greater training efficiency. We demonstrate the effectiveness of our approach on 11 audio recognition datasets, encompassing a variety of speech-processing tasks, and compare the results with three baselines in a few-shot learning setup. Our method is either on par with or outperforms other approaches while being computationally less demanding. Code is available at https://asif-hanif.github.io/palm/

View on arXiv PDF

Similar