Lara Gauder

HC
5papers
5citations
Novelty26%
AI Score31

5 Papers

SDSep 11, 2024
The Unreliability of Acoustic Systems in Alzheimer's Speech Datasets with Heterogeneous Recording Conditions

Lara Gauder, Pablo Riera, Andrea Slachevsky et al.

Automated speech analysis is a thriving approach to detect early markers of Alzheimer's disease (AD). Yet, recording conditions in most AD datasets are heterogeneous, with patients and controls often evaluated in different acoustic settings. While this is not a problem for analyses based on speech transcription or features obtained from manual alignment, it does cast serious doubts on the validity of acoustic features, which are strongly influenced by acquisition conditions. We examined this issue in the ADreSSo dataset, derived from the widely used Pitt corpus. We show that systems based on two acoustic features, MFCCs and Wav2vec 2.0 embeddings, can discriminate AD patients from controls with above-chance performance when using only the non-speech part of the audio signals. We replicated this finding in a separate dataset of Spanish speakers. Thus, in these datasets, the class can be partly predicted by recording conditions. Our results are a warning against the use of acoustic systems for identifying patients based on non-standardized recordings. We propose that acoustically heterogeneous datasets for dementia studies should be either (a) analyzed using only transcripts or other features derived from manual annotations, or (b) replaced by datasets collected with strictly controlled acoustic conditions.

31.0SDApr 29
A Toolkit for Detecting Spurious Correlations in Speech Datasets

Lara Gauder, Pablo Riera, Andrea Slachevsky et al.

We introduce a toolkit for uncovering spurious correlations between recording characteristics and target class in speech datasets. Spurious correlations may arise due to heterogeneous recording conditions, a common scenario for health-related datasets. When present both in the training and test data, these correlations result in an overestimation of the system performance -- a dangerous situation, specially in high-stakes application where systems are required to satisfy minimum performance requirements. Our toolkit implements a diagnostic method based on the detection of the target class using only the non-speech regions in the audio. Better than chance performance at this task indicates that information about the target class can be extracted from the non-speech regions, flagging the presence of spurious correlations. The toolkit is publicly available for research use.

HCFeb 9, 2021
A Study on the Manifestation of Trust in Speech

Lara Gauder, Leonardo Pepino, Pablo Riera et al.

Research has shown that trust is an essential aspect of human-computer interaction directly determining the degree to which the person is willing to use a system. An automatic prediction of the level of trust that a user has on a certain system could be used to attempt to correct potential distrust by having the system take relevant actions like, for example, apologizing or explaining its decisions. In this work, we explore the feasibility of automatically detecting the level of trust that a user has on a virtual assistant (VA) based on their speech. We developed a novel protocol for collecting speech data from subjects induced to have different degrees of trust in the skills of a VA. The protocol consists of an interactive session where the subject is asked to respond to a series of factual questions with the help of a virtual assistant. In order to induce subjects to either trust or distrust the VA's skills, they are first informed that the VA was previously rated by other users as being either good or bad; subsequently, the VA answers the subjects' questions consistently to its alleged abilities. All interactions are speech-based, with subjects and VAs communicating verbally, which allows the recording of speech produced under different trust conditions. Using this protocol, we collected a speech corpus in Argentine Spanish. We show clear evidence that the protocol effectively succeeded in influencing subjects into the desired mental state of either trusting or distrusting the agent's skills, and present results of a perceptual study of the degree of trust performed by expert listeners. Finally, we found that the subject's speech can be used to detect which type of VA they were using, which could be considered a proxy for the user's trust toward the VA's abilities, with an accuracy up to 76%, compared to a random baseline of 50%.

ASJul 30, 2020
Detecting Distrust Towards the Skills of a Virtual Assistant Using Speech

Leonardo Pepino, Pablo Riera, Lara Gauder et al.

Research has shown that trust is an essential aspect of human-computer interaction directly determining the degree to which the person is willing to use the system. An automatic prediction of the level of trust that a user has on a certain system could be used to attempt to correct potential distrust by having the system take relevant actions like, for example, explaining its actions more thoroughly. In this work, we explore the feasibility of automatically detecting the level of trust that a user has on a virtual assistant (VA) based on their speech. We use a dataset collected for this purpose, containing human-computer speech interactions where subjects were asked to answer various factual questions with the help of a virtual assistant, which they were led to believe was either very reliable or unreliable. We find that the subject's speech can be used to detect which type of VA they were using, which could be considered a proxy for the user's trust toward the VA's abilities, with an accuracy up to 76\%, compared to a random baseline of 50\%. These results are obtained using features that have been previously found useful for detecting speech directed to infants and non-native speakers.

HCJun 10, 2020
Trust-UBA: A Corpus for the Study of the Manifestation of Trust in Speech

Lara Gauder, Pablo Riera, Leonardo Pepino et al.

This paper describes a novel protocol for collecting speech data from subjects induced to have different degrees of trust in the skills of a conversational agent. The protocol consists of an interactive session where the subject is asked to respond to a series of factual questions with the help of a virtual assistant. In order to induce subjects to either trust or distrust the agent's skills, they are first informed that it was previously rated by other users as being either good or bad; subsequently, the agent answers the subjects' questions consistently to its alleged abilities. All interactions are speech-based, with subjects and agents communicating verbally, which allows the recording of speech produced under different trust conditions. We collected a speech corpus in Argentine Spanish using this protocol, which we are currently using to study the feasibility of predicting the degree of trust from speech. We find clear evidence that the protocol effectively succeeded in influencing subjects into the desired mental state of either trusting or distrusting the agent's skills, and present preliminary results of a perceptual study of the degree of trust performed by expert listeners. The collected speech dataset will be made publicly available once ready.