CLDec 21, 2022Code
Automatic Emotion Modelling in Written StoriesLukas Christ, Shahin Amiriparian, Manuel Milling et al.
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modelling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no labelled benchmark for this task. We address this gap by introducing continuous valence and arousal annotations for an existing dataset of children's stories annotated with discrete emotion categories. We collect additional annotations for this data and map the originally categorical labels to the valence and arousal space. Leveraging recent advances in Natural Language Processing, we propose a set of novel Transformer-based methods for predicting valence and arousal signals over the course of written stories. We explore several strategies for fine-tuning a pretrained ELECTRA model and study the benefits of considering a sentence's context when inferring its emotionality. Moreover, we experiment with additional LSTM and Transformer layers. The best configuration achieves a Concordance Correlation Coefficient (CCC) of .7338 for valence and .6302 for arousal on the test set, demonstrating the suitability of our proposed approach. Our code and additional annotations are made available at https://github.com/lc0197/emotion_modelling_stories.
SDAug 12, 2024
Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample ImportanceManuel Milling, Shuo Liu, Andreas Triantafyllopoulos et al.
Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solution to jointly optimise the models for audio enhancement (AE) and the subsequent applications. To guide the optimisation of the AE module towards a target application, and especially to overcome difficult samples, we make use of the sample-wise performance measure as an indication of sample importance. In experiments, we consider four representative applications to evaluate our training paradigm, i.e., ASR, speech command recognition (SCR), speech emotion recognition (SER), and ASC. These applications are associated with speech and non-speech tasks concerning semantic and non-semantic features, transient and global information, and the experimental results indicate that our proposed approach can considerably boost the noise robustness of the models, especially at low signal-to-noise ratios (SNRs), for a wide range of computer audition tasks in everyday-life noisy environments.
HCOct 11, 2020Code
Towards Somaesthetics Inspired Games: Exploring the Influence of a Mirror Effect on Self-Presentation in a Public SettingFiona Guerin, Alice Rey, Enis Caliskan et al.
We report on an initial user study, which explores how players of an augmented mirror game, self-style or self-present themselves when they are allowed to see themselves in the mirror compared to when they do not see themselves. To this end, we customized an open source fruit slicing game into an interactive installation for an architecture museum and conducted with 36 visitors a field study. Based on an analysis of video recordings of participants we identified, for example significant differences in how often participants smile. Ultimately, presenting a self-image to gamers in a social setting resulted in behavior change, which we argue could be utilized carefully from a Somaesthetics perspective as an experience design feature in future games.
HCFeb 26, 2025
Static Vs. Agentic Game Master AI for Facilitating Solo Role-Playing ExperiencesNicolai Hejlesen Jørgensen, Sarmilan Tharmabalan, Ilhan Aslan et al.
This paper presents a game master AI for single-player role-playing games. The AI is designed to deliver interactive text-based narratives and experiences typically associated with multiplayer tabletop games like Dungeons & Dragons. We report on the design process and the series of experiments to improve the functionality and experience design, resulting in two functional versions of the system. While v1 of our system uses simplified prompt engineering, v2 leverages a multi-agent architecture and the ReAct framework to include reasoning and action. A comparative evaluation demonstrates that v2 as an agentic system maintains play while significantly improving modularity and game experience, including immersion and curiosity. Our findings contribute to the evolution of AI-driven interactive fiction, highlighting new avenues for enhancing solo role-playing experiences.
SDAug 4, 2025
Detecting COPD Through Speech Analysis: A Dataset of Danish Speech and Machine Learning ApproachCuno Sankey-Olsen, Rasmus Hvass Olesen, Tobias Oliver Eberhard et al.
Chronic Obstructive Pulmonary Disease (COPD) is a serious and debilitating disease affecting millions around the world. Its early detection using non-invasive means could enable preventive interventions that improve quality of life and patient outcomes, with speech recently shown to be a valuable biomarker. Yet, its validity across different linguistic groups remains to be seen. To that end, audio data were collected from 96 Danish participants conducting three speech tasks (reading, coughing, sustained vowels). Half of the participants were diagnosed with different levels of COPD and the other half formed a healthy control group. Subsequently, we investigated different baseline models using openSMILE features and learnt x-vector embeddings. We obtained a best accuracy of 67% using openSMILE features and logistic regression. Our findings support the potential of speech-based analysis as a non-invasive, remote, and scalable screening tool as part of future COPD healthcare solutions.
CLJun 4, 2024
Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised LearningLukas Christ, Shahin Amiriparian, Manuel Milling et al.
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by introducing continuous valence and arousal labels for an existing dataset of children's stories originally annotated with discrete emotion categories. We collect additional annotations for this data and map the categorical labels to the continuous valence and arousal space. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. The best configuration achieves a Concordance Correlation Coefficient (CCC) of $.8221$ for valence and $.7125$ for arousal on the test set, demonstrating the efficacy of our proposed approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story. In addition, we uncover the weaknesses of our approach by investigating examples that prove to be difficult to predict.
SDMar 29, 2022
An Overview & Analysis of Sequence-to-Sequence Emotional Voice ConversionZijiang Yang, Xin Jing, Andreas Triantafyllopoulos et al.
Emotional voice conversion (EVC) focuses on converting a speech utterance from a source to a target emotion; it can thus be a key enabling technology for human-computer interaction applications and beyond. However, EVC remains an unsolved research problem with several challenges. In particular, as speech rate and rhythm are two key factors of emotional conversion, models have to generate output sequences of differing length. Sequence-to-sequence modelling is recently emerging as a competitive paradigm for models that can overcome those challenges. In an attempt to stimulate further research in this promising new direction, recent sequence-to-sequence EVC papers were systematically investigated and reviewed from six perspectives: their motivation, training strategies, model architectures, datasets, model inputs, and evaluation methods. This information is organised to provide the research community with an easily digestible overview of the current state-of-the-art. Finally, we discuss existing challenges of sequence-to-sequence EVC.
SDApr 20, 2021
On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning EraShahin Amiriparian, Artem Sokolov, Ilhan Aslan et al.
Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion recognition per se and in the context of fusion with acoustic information exploitation in the age of deep ASR systems. In order to tackle the above issues, we create transcripts from the original speech by applying three modern ASR systems, including an end-to-end model trained with recurrent neural network-transducer loss, a model with connectionist temporal classification loss, and a wav2vec framework for self-supervised learning. Afterwards, we use pre-trained textual models to extract text representations from the ASR outputs and the gold standard. For extraction and learning of acoustic speech features, we utilise openSMILE, openXBoW, DeepSpectrum, and auDeep. Finally, we conduct decision-level fusion on both information streams -- acoustics and linguistics. Using the best development configuration, we achieve state-of-the-art unweighted average recall values of $73.6\,\%$ and $73.8\,\%$ on the speaker-independent development and test partitions of IEMOCAP, respectively.
HCMar 27, 2021
Towards Tool-Support for Interactive-Machine Learning Applications in the Android EcosystemMuhammad Mehran Sunny, Moritz Berghofer, Ilhan Aslan
Consumer applications are becoming increasingly smarter and most of them have to run on device ecosystems. Potential benefits are for example enabling cross-device interaction and seamless user experiences. Essential for today's smart solutions with high performance are machine learning models. However, these models are often developed separately by AI engineers for one specific device and do not consider the challenges and potentials associated with a device ecosystem in which their models have to run. We believe that there is a need for tool-support for AI engineers to address the challenges of implementing, testing, and deploying machine learning models for a next generation of smart interactive consumer applications. This paper presents preliminary results of a series of inquiries, including interviews with AI engineers and experiments for an interactive machine learning use case with a Smartwatch and Smartphone. We identified the themes through interviews and hands-on experience working on our use case and proposed features, such as data collection from sensors and easy testing of the resources consumption of running pre-processing code on the target device, which will serve as tool-support for AI engineers.
HCOct 10, 2020
Drawing with AI -- Exploring Collaborative Inking Experiences Based on Mid-air Pointing and Reinforcement LearningFranziska Geiger, Michelle Martin, Monika Pichlmair et al.
Digitalization is changing the nature of tools and materials, which are used in artistic practices in professional and non-professional settings. For example, today it is common that even children express their ideas and explore their creativity by drawing on tablets as digital canvases. While there are many software-based tools, which resemble traditional tools, such as various forms of virtual brushes, erasers, etc. in contrast to traditional materials there is potential in augmenting software-based tools and digital canvases with artificial intelligence. Curious about how it would feel to interact with a digital canvas, which would be in contrast to a traditional canvas dynamic, responsive, and potentially able to continuously adapt to its user's input, we developed a drawing application and conducted a qualitative study with 14 users. In this paper, we describe details of our design process, which lead up to using a k-armed bandit as a simple form of reinforcement learning and a LeapMotion sensor to allow people from all walks of like, old and young to draw on pervasive displays, small and large, positioned near or far.
HCMay 5, 2020
Resonating Experiences of Self and Others enabled by a Tangible Somaesthetic DesignIlhan Aslan, Andreas Seiderer, Chi Tai Dang et al.
Digitalization is penetrating every aspect of everyday life including a human's heart beating, which can easily be sensed by wearable sensors and displayed for others to see, feel, and potentially "bodily resonate" with. Previous work in studying human interactions and interaction designs with physiological data, such as a heart's pulse rate, have argued that feeding it back to the users may, for example support users' mindfulness and self-awareness during various everyday activities and ultimately support their wellbeing. Inspired by Somaesthetics as a discipline, which focuses on an appreciation of the living body's role in all our experiences, we designed and explored mobile tangible heart beat displays, which enable rich forms of bodily experiencing oneself and others in social proximity. In this paper, we first report on the design process of tangible heart displays and then present results of a field study with 30 pairs of participants. Participants were asked to use the tangible heart displays during watching movies together and report their experience in three different heart display conditions (i.e., displaying their own heart beat, their partner's heart beat, and watching a movie without a heart display). We found, for example that participants reported significant effects in experiencing sensory immersion when they felt their own heart beats compared to the condition without any heart beat display, and that feeling their partner's heart beats resulted in significant effects on social experience. We refer to resonance theory to discuss the results, highlighting the potential of how ubiquitous technology could utilize physiological data to provide resonance in a modern society facing social acceleration.