LGAIMay 12, 2023

Using Deepfake Technologies for Word Emphasis Detection

arXiv:2305.07791v1124 citations
Originality Incremental advance
AI Analysis

This work addresses a specific problem in speech processing for applications like accessibility or language learning, but it is incremental as it adapts existing deepfake methods to a new task.

The paper tackles the challenge of automated emphasis detection in spoken language by using deepfake technology to generate emphasis-devoid speech from the same speaker, enabling easier isolation and detection of emphasis patterns.

In this work, we consider the task of automated emphasis detection for spoken language. This problem is challenging in that emphasis is affected by the particularities of speech of the subject, for example the subject accent, dialect or voice. To address this task, we propose to utilize deep fake technology to produce an emphasis devoid speech for this speaker. This requires extracting the text of the spoken voice, and then using a voice sample from the same speaker to produce emphasis devoid speech for this task. By comparing the generated speech with the spoken voice, we are able to isolate patterns of emphasis which are relatively easy to detect.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes