HCApr 12, 2021
Speaking of Trust -- Speech as a Measure of TrustElla Velner, Khiet P. Truong, Vanessa Evers
Since trust measures in human-robot interaction are often subjective or not possible to implement real-time, we propose to use speech cues (on what, when and how the user talks) as an objective real-time measure of trust. This could be implemented in the robot to calibrate towards appropriate trust. However, we would like to open the discussion on how to deal with the ethical implications surrounding this trust measure.
ROJul 23, 2020
Usability of a Robot's Realistic Facial Expressions and Peripherals in Autistic Children's TherapyJamy Li, Daniel Davison, Bob Schadenberg et al.
Robot-assisted therapy is an emerging form of therapy for autistic children, although designing effective robot behaviors is a challenge for effective implementation of such therapy. A series of usability tests assessed trends in the effectiveness of modelling a robot's facial expressions on realistic facial expressions and of adding peripherals enabling child-led control of emotion learning activities with autistic children. Nineteen autistic children interacted with a small humanoid robot and an adult therapist in several emotion-learning activities that featured realistic facial expressions modelled on either a pre-existing database or live facial mirroring, and that used peripherals (tablets or tangible 'squishies') to enable child-led activities. Both types of realistic facial expressions by the robot were less effective than exaggerated expressions, with the mirroring being unintuitive for children. The tablet was usable but required more feedback and lower latency, while the tactile tangibles were engaging aids.
ROAug 21, 2017
Planning Based System for Child-Robot Interaction in Dynamic Play EnvironmentsVicky Charisi, Bram Ridder, Jaebok Kim et al.
This paper describes the initial steps towards the design of a robotic system that intends to perform actions autonomously in a naturalistic play environment. At the same time it aims for social human-robot interaction~(HRI), focusing on children. We draw on existing theories of child development and on dimensional models of emotions to explore the design of a dynamic interaction framework for natural child-robot interaction. In this dynamic setting, the social HRI is defined by the ability of the system to take into consideration the socio-emotional state of the user and to plan appropriately by selecting appropriate strategies for execution. The robot needs a temporal planning system, which combines features of task-oriented actions and principles of social human robot interaction. We present initial results of an empirical study for the evaluation of the proposed framework in the context of a collaborative sorting game.
CLAug 14, 2017
Learning spectro-temporal features with 3D CNNs for speech emotion recognitionJaebok Kim, Khiet P. Truong, Gwenn Englebienne et al.
In this paper, we propose to use deep 3-dimensional convolutional networks (3D CNNs) in order to address the challenge of modelling spectro-temporal dynamics for speech emotion recognition (SER). Compared to a hybrid of Convolutional Neural Network and Long-Short-Term-Memory (CNN-LSTM), our proposed 3D CNNs simultaneously extract short-term and long-term spectral features with a moderate number of parameters. We evaluated our proposed and other state-of-the-art methods in a speaker-independent manner using aggregated corpora that give a large and diverse set of speakers. We found that 1) shallow temporal and moderately deep spectral kernels of a homogeneous architecture are optimal for the task; and 2) our 3D CNNs are more effective for spectro-temporal feature learning compared to other methods. Finally, we visualised the feature space obtained with our proposed method using t-distributed stochastic neighbour embedding (T-SNE) and could observe distinct clusters of emotions.
CLAug 13, 2017
Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task LearningJaebok Kim, Gwenn Englebienne, Khiet P. Truong et al.
One of the challenges in Speech Emotion Recognition (SER) "in the wild" is the large mismatch between training and test data (e.g. speakers and tasks). In order to improve the generalisation capabilities of the emotion models, we propose to use Multi-Task Learning (MTL) and use gender and naturalness as auxiliary tasks in deep neural networks. This method was evaluated in within-corpus and various cross-corpus classification experiments that simulate conditions "in the wild". In comparison to Single-Task Learning (STL) based state of the art methods, we found that our MTL method proposed improved performance significantly. Particularly, models using both gender and naturalness achieved more gains than those using either gender or naturalness separately. This benefit was also found in the high-level representations of the feature space, obtained from our method proposed, where discriminative emotional clusters could be observed.