CVApr 13, 2022
Mitigating Bias in Facial Analysis Systems by Incorporating Label DiversityCamila Kolling, Victor Araujo, Adriano Veloso et al.
Facial analysis models are increasingly applied in real-world applications that have significant impact on peoples' lives. However, as literature has shown, models that automatically classify facial attributes might exhibit algorithmic discrimination behavior with respect to protected groups, potentially posing negative impacts on individuals and society. It is therefore critical to develop techniques that can mitigate unintended biases in facial classifiers. Hence, in this work, we introduce a novel learning method that combines both subjective human-based labels and objective annotations based on mathematical definitions of facial traits. Specifically, we generate new objective annotations from two large-scale human-annotated dataset, each capturing a different perspective of the analyzed facial trait. We then propose an ensemble learning method, which combines individual models trained on different types of annotations. We provide an in-depth analysis of the annotation procedure as well as the datasets distribution. Moreover, we empirically demonstrate that, by incorporating label diversity, our method successfully mitigates unintended biases, while maintaining significant accuracy on the downstream tasks.
IVSep 6, 2024
Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning TechniquesDavide Clode da Silva, Marina Musse Bernardes, Nathalia Giacomini Ceretta et al.
Machine learning has significantly advanced healthcare by aiding in disease prevention and treatment identification. However, accessing patient data can be challenging due to privacy concerns and strict regulations. Generating synthetic, realistic data offers a potential solution for overcoming these limitations, and recent studies suggest that fine-tuning foundation models can produce such data effectively. In this study, we explore the potential of foundation models for generating realistic medical images, particularly chest x-rays, and assess how their performance improves with fine-tuning. We propose using a Latent Diffusion Model, starting with a pre-trained foundation model and refining it through various configurations. Additionally, we performed experiments with input from a medical professional to assess the realism of the images produced by each trained model.
CVDec 4, 2023
Can we truly transfer an actor's genuine happiness to avatars? An investigation into virtual, real, posed and spontaneous facesVitor Miguel Xavier Peres, Greice Pinho Dal Molin, Soraia Raupp Musse
A look is worth a thousand words is a popular phrase. And why is a simple look enough to portray our feelings about something or someone? Behind this question are the theoretical foundations of the field of psychology regarding social cognition and the studies of psychologist Paul Ekman. Facial expressions, as a form of non-verbal communication, are the primary way to transmit emotions between human beings. The set of movements and expressions of facial muscles that convey some emotional state of the individual to their observers are targets of studies in many areas. Our research aims to evaluate Ekman's action units in datasets of real human faces, posed and spontaneous, and virtual human faces resulting from transferring real faces into Computer Graphics faces. In addition, we also conducted a case study with specific movie characters, such as SheHulk and Genius. We intend to find differences and similarities in facial expressions between real and CG datasets, posed and spontaneous faces, and also to consider the actors' genders in the videos. This investigation can help several areas of knowledge, whether using real or virtual human beings, in education, health, entertainment, games, security, and even legal matters. Our results indicate that AU intensities are greater for posed than spontaneous datasets, regardless of gender. Furthermore, there is a smoothing of intensity up to 80 percent for AU6 and 45 percent for AU12 when a real face is transformed into CG.
CVApr 2
True to Tone? Quantifying Skin Tone Fidelity and Bias in Photographic-to-Virtual Human PipelinesGabriel Ferri Schneider, Erick Menezes, Rafael Mecenas et al.
Accurate reproduction of facial skin tone is essential for realism, identity preservation, and fairness in Virtual Human (VH) rendering. However, most accessible avatar creation pipelines rely on photographic inputs that lack colorimetric calibration, which can introduce inconsistencies and bias. We propose a fully automatic and scalable methodology to systematically evaluate skin tone fidelity across the VH generation pipeline. Our approach defines a full workflow that integrates skin color and illumination extraction, texture recolorization, real-time rendering, and quantitative color analysis. Using facial images from the Chicago Face Database (CFD), we compare skin tone extraction strategies based on cheek-region sampling, following the literature, and multidimensional masking derived from full-face analysis. Additionally, we test both strategies with lighting isolation, using the pre-trained TRUST framework, employed without any training or optimization within our pipeline. Extracted skin tones are applied to MetaHuman textures and rendered under multiple lighting configurations. Skin tone consistency is evaluated objectively in the CIELAB color space using the $ÎE$ metric and the Individual Typology Angle (ITA). The proposed methodology operates without manual intervention and, with the exception of pre-trained illumination compensation modules, the pipeline does not include learning or training stages, enabling low computational cost and large-scale evaluation. Using this framework, we generate and analyze approximately 19,848 rendered instances. Our results show phenotype-dependent behavior of extraction strategies and consistently higher colorimetric errors for darker skin tones.
HCJul 3, 2025
Are You Listening to Me? Fine-Tuning Chatbots for Empathetic DialoguePaulo Ricardo Knob, Leonardo Scholler, Juliano Rigatti et al.
Conversational agents have made significant progress since ELIZA, expanding their role across various domains, including healthcare, education, and customer service. As these agents become increasingly integrated into daily human interactions, the need for emotional intelligence, particularly empathetic listening, becomes increasingly essential. In this study, we explore how Large Language Models (LLMs) respond when tasked with generating emotionally rich interactions. Starting from a small dataset manually crafted by an expert to reflect empathic behavior, we extended the conversations using two LLMs: ChatGPT and Gemini. We analyzed the emotional progression of the dialogues using both sentiment analysis (via VADER) and expert assessments. While the generated conversations often mirrored the intended emotional structure, human evaluation revealed important differences in the perceived empathy and coherence of the responses. These findings suggest that emotion modeling in dialogues requires not only structural alignment in the expressed emotions but also qualitative depth, highlighting the importance of combining automated and humancentered methods in the development of emotionally competent agents.
CVDec 11, 2023
Detecting Events in Crowds Through Changes in Geometrical Dimensions of PedestriansMatheus Schreiner Homrich da Silva, Paulo Brossard de Souza Pinto Neto, Rodolfo Migon Favaretto et al.
Security is an important topic in our contemporary world, and the ability to automate the detection of any events of interest that can take place in a crowd is of great interest to a population. We hypothesize that the detection of events in videos is correlated with significant changes in pedestrian behaviors. In this paper, we examine three different scenarios of crowd behavior, containing both the cases where an event triggers a change in the behavior of the crowd and two video sequences where the crowd and its motion remain mostly unchanged. With both the videos and the tracking of the individual pedestrians (performed in a pre-processed phase), we use Geomind, a software we developed to extract significant data about the scene, in particular, the geometrical features, personalities, and emotions of each person. We then examine the output, seeking a significant change in the way each person acts as a function of the time, that could be used as a basis to identify events or to model realistic crowd actions. When applied to the games area, our method can use the detected events to find some sort of pattern to be then used in agent simulation. Results indicate that our hypothesis seems valid in the sense that the visually observed events could be automatically detected using GeoMind.
HCSep 30, 2021
Is my agent good enough? Evaluating Embodied Conversational Agents with Long and Short-term interactionsJuliane B. S. dos Santos, Paulo Ricardo Knob, Victor Putrich Scherer et al.
The use of digital resources has been increasing in every instance of todays society, being it in business or even ludic purposes. Despite such ever increasing use of technologies as interfaces, in all fields, it seems that it lacks the importance of users perception in this context. This work aims to present a case study about the evaluation of ECAs. We propose a Long-Term Interaction (LTI) to evaluate our conversational agent effectiveness through the user perception and compare it with Short-Term Interactions (STIs), performed by three users. Results show that many different aspects of users perception about the chosen ECA (i.e. Arthur) could be evaluated in our case study, in particular that LTI and STI are both important in order to have a better understanding of ECA impact in UX.
HCApr 27, 2021
Arthur: a new ECA that uses Memory to improve CommunicationPaulo Knob, Willian S. Dias, Natanael Kuniechick et al.
This article proposes an embodied conversational agent named Arthur. In addition to being able to talk to a person (using text and voice), he is also able to recognize the person he is talking to and detect his/her expressed emotion through facial expressions. Arthur uses these skills to improve communication with the user, also using his artificial memory, which stores and retrieves data about events and facts, based on a human memory model. We conducted some experiments to collect quantitative and qualitative information, which show that our model provides a consistent impact on users.
CVApr 27, 2021
Detecting Personality and Emotion Traits in Crowds from Video SequencesRodolfo Migon Favaretto, Paulo Knob, Soraia Raupp Musse et al.
This paper presents a methodology to detect personality and basic emotion characteristics of crowds in video sequences. Firstly, individuals are detected and tracked, then groups are recognized and characterized. Such information is then mapped to OCEAN dimensions, used to find out personality and emotion in videos, based on OCC emotion models. Although it is a clear challenge to validate our results with real life experiments, we evaluate our method with the available literature information regarding OCEAN values of different Countries and also emergent Personal distance among people. Hence, such analysis refer to cultural differences of each country too. Our results indicate that this model generates coherent information when compared to data provided in available literature, as shown in qualitative and quantitative results.
CVAug 18, 2019
A Software to Detect OCC Emotion, Big-Five Personality and Hofstede Cultural Dimensions of Pedestrians from Video SequencesRodolfo Migon Favaretto, Victor Araujo, Soraia Raupp Musse et al.
This paper presents a video analysis application to detect personality, emotion and cultural aspects from pedestrians in video sequences, along with a visualizer of features. The proposed model considers a series of characteristics of the pedestrians and the crowd, such as number and size of groups, distances, speeds, among others, and performs the mapping of these characteristics in personalities, emotions and cultural aspects, considering the Cultural Dimensions of Hofstede (HCD), the Big-Five Personality Model (OCEAN) and the OCC Emotional Model. The main hypothesis is that there is a relationship between so-called intrinsic human variables (such as emotion) and the way people behave in space and time. The software was tested in a set of videos from different countries and results seem promising in order to identify these three different levels of psychological traits in the filmed sequences. In addition, the data of the people present in the videos can be seen in a crowd viewer.
GRApr 24, 2019
How much do you perceive this? An analysis on perceptions of geometric features, personalities and emotions in virtual humans (Extended Version)Victor Araujo, Rodolfo Migon Favaretto, Paulo Knob et al.
This work aims to evaluate people's perception regarding geometric features, personalities and emotions characteristics in virtual humans. For this, we use as a basis, a dataset containing the tracking files of pedestrians captured from spontaneous videos and visualized them as identical virtual humans. The goal is to focus on their behavior and not being distracted by other features. In addition to tracking files containing their positions, the dataset also contains pedestrian emotions and personalities detected using Computer Vision and Pattern Recognition techniques. We proceed with our analysis in order to answer the question if subjects can perceive geometric features as distances/speeds as well as emotions and personalities in video sequences when pedestrians are represented by virtual humans. Regarding the participants, an amount of 73 people volunteered for the experiment. The analysis was divided in two parts: i) evaluation on perception of geometric characteristics, such as density, angular variation, distances and speeds, and ii) evaluation on personality and emotion perceptions. Results indicate that, even without explaining to the participants the concepts of each personality or emotion and how they were calculated (considering geometric characteristics), in most of the cases, participants perceived the personality and emotion expressed by the virtual agents, in accordance with the available ground truth.
CVApr 10, 2019
Predicting Future Pedestrian Motion in Video Sequences using Crowd SimulationCliceres dal Bianco, Soraia Raupp Musse
While human and group analysis have become an important area in last decades, some current and relevant applications involve to estimate future motion of pedestrians in real video sequences. This paper presents a method to provide motion estimation of real pedestrians in next seconds, using crowd simulation. Our method is based on Physics and heuristics and use BioCrowds as crowd simulation methodology to estimate future positions of people in video sequences. Results show that our method for estimation works well even for complex videos where events can happen. The maximum achieved average error is $2.72$cm when estimating the future motion of 32 pedestrians with more than 2 seconds in advance. This paper discusses this and other results.
CVMar 5, 2019
Using Big Five Personality Model to Detect Cultural Aspects in CrowdsRodolfo Migon Favaretto, Leandro Dihl, Soraia Raupp Musse et al.
The use of information technology in the study of human behavior is a subject of great scientific interest. Cultural and personality aspects are factors that influence how people interact with one another in a crowd. This paper presents a methodology to detect cultural characteristics of crowds in video sequences. Based on filmed sequences, pedestrians are detected, tracked and characterized. Such information is then used to find out cultural differences in those videos, based on the Big-five personality model. Regarding cultural differences of each country, results indicate that this model generates coherent information when compared to data provided in literature.