ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024
This addresses the issue of audio generation not aligning with human subjective experience for users of applications such as companion robots and marketing bots, but it is incremental as it focuses on a challenge rather than a new method.
The paper tackles the problem of text-to-speech (TTS) technology's limited ability to convey complex emotions and controlled detail content, which causes a discrepancy with human subjective perception in applications like companion robots and marketing bots, by organizing the ICAGC 2024 challenge to enhance persuasiveness and acceptability of synthesized audio, with 19 teams participating.
The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective perception in practical applications like companion robots for children and marketing bots. The core issue lies in the inconsistency between high-quality audio generation and the ultimate human subjective experience. Therefore, this challenge aims to enhance the persuasiveness and acceptability of synthesized audio, focusing on human alignment convincing and inspirational audio generation. A total of 19 teams have registered for the challenge, and the results of the competition and the competition are described in this paper.