29.3SIMar 26
Auditing Algorithmic Personalization in TikTok Comment SectionsYueru Yan, Siqi Wu
Personalization algorithms are ubiquitous in modern social computing systems, yet their effects on comment sections remain underexplored. In this work, we conducted an algorithmic auditing experiment to examine comment personalization on TikTok. We trained sock-puppet accounts to exhibit left-leaning or right-leaning preferences and successfully validated 17 of them by analyzing the videos recommended on their For You Pages. We then scraped the comment sections shown to these trained partisan accounts, along with five cold-start accounts, across 65 politically neutral videos related to the 2024 U.S. presidential election that contain abundant discussions from both left-leaning and right-leaning perspectives. We find that while the composition of top comments remains largely consistent for all videos, ranking divergence between accounts from different political groups is significantly greater than that observed within the same group for some videos. This effect is strongly correlated with video-level metrics such as comment volume, engagement inequality, and partisan skew in the comment sections. Furthermore, through an exploratory case study, we find preliminary evidence that personalization can result in comment exposure aligned with an account's political leaning. However, this pattern is not universal, suggesting that the extent of politically oriented comment personalization is context-dependent.
CLDec 19, 2025
ShareChat: A Dataset of Chatbot Conversations in the WildYueru Yan, Tuc Nguyen, Bo Su et al.
While academic research typically treats Large Language Models (LLM) as generic text generators, they are distinct commercial products with unique interfaces and capabilities that fundamentally shape user behavior. Current datasets obscure this reality by collecting text-only data through uniform interfaces that fail to capture authentic chatbot usage. To address this limitation, we present ShareChat, a large-scale corpus of 142,808 conversations (660,293 turns) sourced directly from publicly shared URLs on ChatGPT, Perplexity, Grok, Gemini, and Claude. ShareChat distinguishes itself by preserving native platform affordances, such as citations and thinking traces, across a diverse collection covering 101 languages and the period from April 2023 to October 2025. Furthermore, ShareChat offers substantially longer context windows and greater interaction depth than prior datasets. To illustrate the dataset's breadth, we present three case studies: a completeness analysis of intent satisfaction, a citation study of model grounding, and a temporal analysis of engagement rhythms. This work provides the community with a vital and timely resource for understanding authentic user-LLM chatbot interactions in the wild. The dataset is publicly available via Hugging Face.
CLJul 6, 2025
Fairness Evaluation of Large Language Models in Academic Library Reference ServicesHaining Wang, Jason Clark, Yueru Yan et al.
As libraries explore large language models (LLMs) for use in virtual reference services, a key question arises: Can LLMs serve all users equitably, regardless of demographics or social status? While they offer great potential for scalable support, LLMs may also reproduce societal biases embedded in their training data, risking the integrity of libraries' commitment to equitable service. To address this concern, we evaluate whether LLMs differentiate responses across user identities by prompting six state-of-the-art LLMs to assist patrons differing in sex, race/ethnicity, and institutional role. We found no evidence of differentiation by race or ethnicity, and only minor evidence of stereotypical bias against women in one model. LLMs demonstrated nuanced accommodation of institutional roles through the use of linguistic choices related to formality, politeness, and domain-specific vocabularies, reflecting professional norms rather than discriminatory treatment. These findings suggest that current LLMs show a promising degree of readiness to support equitable and contextually appropriate communication in academic library reference services.