IRCLSDASAug 20, 2019

From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories

arXiv:1908.07590v15 citations
AI Analysis

This work addresses a domain-specific problem for radio story producers, offering an incremental improvement by hybridizing existing methods to enhance retrieval accuracy.

The paper tackles the problem of automatically adding sound effects to radio stories to reduce labor costs, proposing a retrieval-based framework hybridized with a semantic inference model that improves robustness and discusses feature importance and heuristic rules for precision-recall trade-offs.

Sound effects play an essential role in producing high-quality radio stories but require enormous labor cost to add. In this paper, we address the problem of automatically adding sound effects to radio stories with a retrieval-based model. However, directly implementing a tag-based retrieval model leads to high false positives due to the ambiguity of story contents. To solve this problem, we introduce a retrieval-based framework hybridized with a semantic inference model which helps to achieve robust retrieval results. Our model relies on fine-designed features extracted from the context of candidate triggers. We collect two story dubbing datasets through crowdsourcing to analyze the setting of adding sound effects and to train and test our proposed methods. We further discuss the importance of each feature and introduce several heuristic rules for the trade-off between precision and recall. Together with the text-to-speech technology, our results reveal a promising automatic pipeline on producing high-quality radio stories.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes