The importance of fillers for text representations of speech transcripts
This addresses a gap in SLU for applications like dialogue systems, but it is incremental as it applies existing embedding methods to a specific aspect of speech.
The paper tackled the problem of fillers being overlooked in spoken language understanding by representing them with deep contextualized embeddings, resulting in improvements on modeling spoken language and downstream tasks like predicting speaker stance and confidence.
While being an essential component of spoken language, fillers (e.g."um" or "uh") often remain overlooked in Spoken Language Understanding (SLU) tasks. We explore the possibility of representing them with deep contextualised embeddings, showing improvements on modelling spoken language and two downstream tasks - predicting a speaker's stance and expressed confidence.