Daniel Chechelnitsky

CL
h-index49
4papers
18citations
Novelty36%
AI Score45

4 Papers

CLJun 4
Ouvia: A User-centered Framework for Measuring Usability of Speech Translation in Real-World Communication Scenarios

Giuseppe Attanasio, Beatrice Savoldi, Daniel Chechelnitsky et al.

Speech translation (ST) is increasingly adopted in user applications, yet its evaluation largely focuses on decontextualized testbeds and holistic quality, rather than end users' communication needs. We introduce Ouvia, an evaluation framework for measuring user-perceived usability of speech translation outputs in real-world settings. Ouvia focuses on one-to-one communication: an English speaker needs to convey a request to a Portuguese speaker, and the message is automatically translated. Through a custom web app and multi-phase study design, we collect more than 1,750 such interactions in healthcare and everyday situations, mediated by four ST systems, involving speakers from three English dialects and two genders. We find that modern ST serves people only to a limited extent -- only around half of interactions are rated as usable -- with significant gaps in reported usability across demographic groups. Moreover, among quality metrics, we find that QA-based evaluation is a substantially stronger predictor of real-world usability than standard approaches. Together, these findings stress the importance of situated, user-centered evaluation frameworks that go beyond holistic quality scores and attend to who the technology serves -- and how well.

CYApr 30
Empire Amplifier: Uncovering and Contesting the Prioritization of Colonial Content on Platforms Through Community-Informed Algorithmic Auditing

Nel Escher, Bakyt Yrysov, Ashley McDermott et al.

Though online platforms claim to amplify Indigenous voices, Indigenous communities are worried that these systems are instead eroding their language and culture. We conduct a community-informed algorithmic audit to explore whether online platforms sustain or endanger Indigenous cultural practice. First, we review ethnographic research pertaining to the cultural anxieties of a specific Indigenous community, as Indigenous peoples are not a monolith. We consider concerns from Kyrgyz communities who believe that platforms are expanding Russia's linguistic influence and threatening their language. Next, we construct and conduct an algorithmic audit in conversation with the community. Our audit investigates deep-seated fears among Kyrgyz caregivers that YouTube encourages their children to speak Russian instead of Kyrgyz, their heritage language. We measure how the YouTube recommendation algorithm prioritizes content across Indigenous and non-Indigenous languages for child users. Our results validate caregiver concerns, as we find that YouTube primarily recommends non-Kyrgyz content to Kyrgyz children, even when children signal clear preferences for Kyrgyz content. Thus, platform recommendations reinforce Kyrgyz children's offline uptake of colonial language ideologies. Finally, we evaluate strategies to align platform behavior with Indigenous values. We identify effective end-user practices for reducing the proportion of Russian-language YouTube recommendations, like cross-generational device sharing. Overall, our work uncovers how platforms can amplify colonial influence, rather than revitalizing Indigenous cultural heritage. We encourage researchers to consider how algorithmic systems can reimpose oppressive power structures that decolonial efforts have sought to dismantle.

CLFeb 18, 2025
Rejected Dialects: Biases Against African American Language in Reward Models

Joel Mire, Zubin Trivadi Aysola, Daniel Chechelnitsky et al. · allen-ai, cmu

Preference alignment via reward models helps build safe, helpful, and reliable large language models (LLMs). However, subjectivity in preference judgments and the lack of representative sampling in preference data collection can introduce new biases, hindering reward models' fairness and equity. In this work, we introduce a framework for evaluating dialect biases in reward models and conduct a case study on biases against African American Language (AAL) through several experiments comparing reward model preferences and behavior on paired White Mainstream English (WME) and both machine-translated and human-written AAL corpora. We show that reward models are less aligned with human preferences when processing AAL texts vs. WME ones (-4\% accuracy on average), frequently disprefer AAL-aligned texts vs. WME-aligned ones, and steer conversations toward WME, even when prompted with AAL texts. Our findings provide a targeted analysis of anti-AAL biases at a relatively understudied stage in LLM development, highlighting representational harms and ethical questions about the desired behavior of LLMs concerning AAL.

CYApr 1
Translating With Feeling: Centering Translator Perspectives within Translation Technologies

Daniel Chechelnitsky, Sireesh Gururaja, Seyi Olojo et al.

Rapid development of Large Language Models (LLMs) and similar automated approaches for translation tasks is increasingly affecting the landscape of translation technologies. As concerns about the outsourcing of translator work to these automated translation tools grow, it becomes increasingly crucial to gather insights from the translation community directly. To this end, we conduct an interview study with 19 professional translators working across 11 languages and 11 domains to understand their perspectives, experiences, and concerns with using translation technologies in their work. We find that translators are cautious when incorporating new tools into their workflow, with several expressing concerns machine translation (MT) and LLMs are infringing on the necessary human aspects and verification steps of translation, worried that these tools have potential for harmful downstream effects due to compromising the human aspect of translation work. These findings demonstrate the need to develop translation technologies that directly serve translators' needs rather than replacing human translation. This can be done by focusing more on the assistive, rather than the automating aspects of these tools.