AS AI MM SDMay 12, 2022

Automated Audio Captioning: An Overview of Recent Progress and New Challenges

Xinhao Mei, Xubo Liu, Mark D. Plumbley, Wenwu Wang

arXiv:2205.05949v215.259 citationsh-index: 66

Originality Synthesis-oriented

AI Analysis

It serves as an overview for researchers in audio processing and machine learning, highlighting advancements and identifying gaps in the field.

This paper provides a comprehensive review of recent progress in automated audio captioning, summarizing various deep learning approaches, evaluation metrics, and datasets, while also discussing open challenges and future directions.

Automated audio captioning is a cross-modal translation task that aims to generate natural language descriptions for given audio clips. This task has received increasing attention with the release of freely available datasets in recent years. The problem has been addressed predominantly with deep learning techniques. Numerous approaches have been proposed, such as investigating different neural network architectures, exploiting auxiliary information such as keywords or sentence information to guide caption generation, and employing different training strategies, which have greatly facilitated the development of this field. In this paper, we present a comprehensive review of the published contributions in automated audio captioning, from a variety of existing approaches to evaluation metrics and datasets. We also discuss open challenges and envisage possible future research directions.

View on arXiv PDF

Similar