HCCLDec 13, 2017

A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks

arXiv:1712.04798v31088 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the time-consuming and inaccurate nature of current automatic phonetic segmentation methods for speech processing researchers, though it is incremental as it focuses on data collection rather than a new method.

The paper tackles the problem of improving automatic phonetic segmentation by creating a multimodal corpus that records expert gaze and behavior during manual segmentation tasks, aiming to model human segmentation more closely to enhance accuracy.

Phonetic segmentation is the process of splitting speech into distinct phonetic units. Human experts routinely perform this task manually by analyzing auditory and visual cues using analysis software, which is an extremely time-consuming process. Methods exist for automatic segmentation, but these are not always accurate enough. In order to improve automatic segmentation, we need to model it as close to the manual segmentation as possible. This corpus is an effort to capture the human segmentation behavior by recording experts performing a segmentation task. We believe that this data will enable us to highlight the important aspects of manual segmentation, which can be used in automatic segmentation to improve its accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes