CL SD AS MLNov 20, 2017

Speech recognition for medical conversations

Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan

arXiv:1711.07274v28.396 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of accurate speech recognition for medical conversations, which could aid in clinical documentation, but it is incremental as it applies existing methods to a new domain-specific dataset.

The researchers tackled the problem of transcribing doctor-patient conversations by collecting a large-scale dataset of 14,000 hours of clinical conversations and exploring CTC and LAS speech recognition models, finding that LAS was more resilient to noisy data and the models performed well on important medical utterances but had errors in casual conversations.

In this work we explored building automatic speech recognition models for transcribing doctor patient conversation. We collected a large scale dataset of clinical conversations ($14,000$ hr), designed the task to represent the real word scenario, and explored several alignment approaches to iteratively improve data quality. We explored both CTC and LAS systems for building speech recognition models. The LAS was more resilient to noisy data and CTC required more data clean up. A detailed analysis is provided for understanding the performance for clinical tasks. Our analysis showed the speech recognition models performed well on important medical utterances, while errors occurred in causal conversations. Overall we believe the resulting models can provide reasonable quality in practice.

View on arXiv PDF

Similar