Belén Saldías-Fuentes

h-index5

4papers

1,009citations

Novelty46%

AI Score47

Ranked #32,119 of 194,257 authors (top 17%)#6,630 in CL (top 22%)

4 Papers

7.4CYApr 22

Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles

Irti Haq, Belén Saldías

As state-of-the-art Large Language Models (LLMs) have become ubiquitous, ensuring equitable performance across diverse demographics is critical. However, it remains unclear whether these disparities arise from the explicitly stated identity itself or from the way identity is signaled. In real-world interactions, users' identity is often conveyed implicitly through a complex combination of various socio-linguistic factors. This study disentangles these signals by employing a factorial design with over 24,000 responses from two open-weight LLMs (Gemma-3-12B and Qwen-3-VL-8B), comparing prompts with explicitly announced user profiles against implicit dialect signals (e.g., AAVE, Singlish) across various sensitive domains. Our results uncover a unique paradox in LLM safety where users achieve ``better'' performance by sounding like a demographic than by stating they belong to it. Explicit identity prompts activate aggressive safety filters, increasing refusal rates and reducing semantic similarity compared to our reference text for Black users. In contrast, implicit dialect cues trigger a powerful ``dialect jailbreak,'' reducing refusal probability to near zero while simultaneously achieving a greater level of semantic similarity to the reference texts compared to Standard American English prompts. However, this ``dialect jailbreak'' introduces a critical safety trade-off regarding content sanitization. We find that current safety alignment techniques are brittle and over-indexed on explicit keywords, creating a bifurcated user experience where ``standard'' users receive cautious, sanitized information while dialect speakers navigate a less sanitized, more raw, and potentially a more hostile information landscape and highlights a fundamental tension in alignment--between equitable and linguistic diversity--and underscores the need for safety mechanisms that generalize beyond explicit cues.

31.1CLMay 26, 2020Code

Exploring aspects of similarity between spoken personal narratives by disentangling them into narrative clause types

Belen Saldias, Deb Roy

Sharing personal narratives is a fundamental aspect of human social behavior as it helps share our life experiences. We can tell stories and rely on our background to understand their context, similarities, and differences. A substantial effort has been made towards developing storytelling machines or inferring characters' features. However, we don't usually find models that compare narratives. This task is remarkably challenging for machines since they, as sometimes we do, lack an understanding of what similarity means. To address this challenge, we first introduce a corpus of real-world spoken personal narratives comprising 10,296 narrative clauses from 594 video transcripts. Second, we ask non-narrative experts to annotate those clauses under Labov's sociolinguistic model of personal narratives (i.e., action, orientation, and evaluation clause types) and train a classifier that reaches 84.7% F-score for the highest-agreed clauses. Finally, we match stories and explore whether people implicitly rely on Labov's framework to compare narratives. We show that actions followed by the narrator's evaluation of these are the aspects non-experts consider the most. Our approach is intended to help inform machine learning methods aimed at studying or representing personal narratives.

9.3HCJul 26, 2019

Tweet Moodifier: Towards giving emotional awareness to Twitter users

Belen Saldias, Rosalind W. Picard

Emotional contagion in online social networks has been of great interest over the past years. Previous studies have focused mainly on finding evidence of affect contagion in homophilic atmospheres. However, these studies have overlooked users' awareness of the sentiments they share and consume online. In this paper, we present an experiment with Twitter users that aims to help them better understand which emotions they experience on this social network. We introduce Tweet Moodifier (T-Moodifier), a Google Chrome extension that enables Twitter users to filter and make explicit (through colored visual marks) the emotional content in their News Feed. We compare behavioral changes between 55 participants and 5089 of their public "friends." The comparison period spans from two weeks before installing T-Moodifier to one week thereafter. The results suggest that the use of T-Moodifier might help Twitter users increase their emotional awareness: T-Moodifier users who had access to emotional statistics about their posts produced a significantly higher percentage of neutral content. This behavioral change suggests that people could behave differently while using real-time mechanisms that increase their affect reflection. Also, post-experience, those who completed both pre- and post-surveys could assert more confidently the main emotions they shared and perceived on Twitter. This shows T-Moodifier's potential to effectively make users reflect on their News Feed.

1.0LGJan 2, 2019Code

A Full Probabilistic Model for Yes/No Type Crowdsourcing in Multi-Class Classification

Belen Saldias, Pavlos Protopapas, Karim Pichara

Crowdsourcing has become widely used in supervised scenarios where training sets are scarce and difficult to obtain. Most crowdsourcing models in the literature assume labelers can provide answers to full questions. In classification contexts, full questions require a labeler to discern among all possible classes. Unfortunately, discernment is not always easy in realistic scenarios. Labelers may not be experts in differentiating all classes. In this work, we provide a full probabilistic model for a shorter type of queries. Our shorter queries only require "yes" or "no" responses. Our model estimates a joint posterior distribution of matrices related to labelers' confusions and the posterior probability of the class of every object. We developed an approximate inference approach, using Monte Carlo Sampling and Black Box Variational Inference, which provides the derivation of the necessary gradients. We built two realistic crowdsourcing scenarios to test our model. The first scenario queries for irregular astronomical time-series. The second scenario relies on the image classification of animals. We achieved results that are comparable with those of full query crowdsourcing. Furthermore, we show that modeling labelers' failures plays an important role in estimating true classes. Finally, we provide the community with two real datasets obtained from our crowdsourcing experiments. All our code is publicly available.