CVLGMMFeb 19, 2019

Predicting tongue motion in unlabeled ultrasound videos using convolutional LSTM neural network

arXiv:1902.06927v127 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a speaker-dependent challenge in speech production research, with incremental improvements in prediction accuracy for tongue motion analysis.

The study tackled the problem of predicting future tongue movements from past movements in unlabeled ultrasound videos using a convolutional LSTM neural network, achieving better performance than a 3DCNN in predicting the 9th frame from 8 preceding frames and demonstrating capability for more distant frames.

A challenge in speech production research is to predict future tongue movements based on a short period of past tongue movements. This study tackles speaker-dependent tongue motion prediction problem in unlabeled ultrasound videos with convolutional long short-term memory (ConvLSTM) networks. The model has been tested on two different ultrasound corpora. ConvLSTM outperforms 3-dimensional convolutional neural network (3DCNN) in predicting the 9\textsuperscript{th} frames based on 8 preceding frames, and also demonstrates good capacity to predict only the tongue contours in future frames. Further tests reveal that ConvLSTM can also learn to predict tongue movements in more distant frames beyond the immediately following frames. Our codes are available at: https://github.com/shuiliwanwu/ConvLstm-ultrasound-videos.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes