Sounding Like a Winner? Prosodic Differences in Post-Match Interviews
This is an incremental study for sports analytics and emotion detection, focusing on a specific domain with limited broader impact.
This study tackled the problem of classifying tennis match outcomes (winning vs. losing) based on prosodic features and self-supervised learning representations from post-match interviews, finding that SSL representations effectively differentiate outcomes and prosodic cues like pitch variability are strong indicators.
This study examines the prosodic characteristics associated with winning and losing in post-match tennis interviews. Additionally, this research explores the potential to classify match outcomes solely based on post-match interview recordings using prosodic features and self-supervised learning (SSL) representations. By analyzing prosodic elements such as pitch and intensity, alongside SSL models like Wav2Vec 2.0 and HuBERT, the aim is to determine whether an athlete has won or lost their match. Traditional acoustic features and deep speech representations are extracted from the data, and machine learning classifiers are employed to distinguish between winning and losing players. Results indicate that SSL representations effectively differentiate between winning and losing outcomes, capturing subtle speech patterns linked to emotional states. At the same time, prosodic cues -- such as pitch variability -- remain strong indicators of victory.