Prerna Juneja

HC
h-index3
6papers
76citations
Novelty31%
AI Score37

6 Papers

CYSep 16, 2024
Algorithmic Behaviors Across Regions: A Geolocation Audit of YouTube Search for COVID-19 Misinformation Between the United States and South Africa

Hayoung Jung, Prerna Juneja, Tanushree Mitra · uw

Despite being an integral tool for finding health-related information online, YouTube has faced criticism for disseminating COVID-19 misinformation globally to its users. Yet, prior audit studies have predominantly investigated YouTube within the Global North contexts, often overlooking the Global South. To address this gap, we conducted a comprehensive 10-day geolocation-based audit on YouTube to compare the prevalence of COVID-19 misinformation in search results between the United States (US) and South Africa (SA), the countries heavily affected by the pandemic in the Global North and the Global South, respectively. For each country, we selected 3 geolocations and placed sock-puppets, or bots emulating "real" users, that collected search results for 48 search queries sorted by 4 search filters for 10 days, yielding a dataset of 915K results. We found that 31.55% of the top-10 search results contained COVID-19 misinformation. Among the top-10 search results, bots in SA faced significantly more misinformative search results than their US counterparts. Overall, our study highlights the contrasting algorithmic behaviors of YouTube search between two countries, underscoring the need for the platform to regulate algorithmic behavior consistently across different regions of the Globe.

CLApr 30
Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

Prerna Juneja, Lika Lomidze

There are growing concerns about the risks posed by AI companion applications designed for emotional engagement. Existing safety evaluations often rely on self-reported user data or interviews, offering limited insights into real-time dynamics. We present the first end-to-end scalable framework for controlled simulation and safety evaluation of multi-turn interactions with AI companion applications. Our framework integrates four key components: persona construction with clinical and psychometric validation, persona-specific scenario generation, scenario-driven multi-turn simulation with a dialogue refinement module that preserves persona fidelity, and harm evaluation. We apply this framework to evaluate how Replika, a widely used AI companion app, responds to high-risk user groups. We construct 9 personas representing individuals with depression, anxiety, PTSD, eating disorders, and incel identity, and collect 1,674 dialogue pairs across 25 high-risk scenarios. We combine emotion modeling and LLM-assisted utterance-and harm-level classification to analyze these exchanges. Results show that Replika exhibits a narrow emotional range dominated by curiosity and care, while frequently mirroring or normalizing unsafe content such as self-harm, disordered eating, and violent-fantasy narratives. These findings highlight how controlled persona simulations can serve as a scalable testbed for evaluating safety risks in AI companions.

SENov 22, 2015Code
Anvaya: An Algorithm and Case-Study on Improving the Goodness of Software Process Models generated by Mining Event-Log Data in Issue Tracking System

Prerna Juneja, Divya Kundra, Ashish Sureka

Issue Tracking Systems (ITS) such as Bugzilla can be viewed as Process Aware Information Systems (PAIS) generating event-logs during the life-cycle of a bug report. Process Mining consists of mining event logs generated from PAIS for process model discovery, conformance and enhancement. We apply process map discovery techniques to mine event trace data generated from ITS of open source Firefox browser project to generate and study process models. Bug life-cycle consists of diversity and variance. Therefore, the process models generated from the event-logs are spaghetti-like with large number of edges, inter-connections and nodes. Such models are complex to analyse and difficult to comprehend by a process analyst. We improve the Goodness (fitness and structural complexity) of the process models by splitting the event-log into homogeneous subsets by clustering structurally similar traces. We adapt the K-Medoid clustering algorithm with two different distance metrics: Longest Common Subsequence (LCS) and Dynamic Time Warping (DTW). We evaluate the goodness of the process models generated from the clusters using complexity and fitness metrics. We study back-forth \& self-loops, bug reopening, and bottleneck in the clusters obtained and show that clustering enables better analysis. We also propose an algorithm to automate the clustering process -the algorithm takes as input the event log and returns the best cluster set.

HCMay 27, 2025
Can we Debias Social Stereotypes in AI-Generated Images? Examining Text-to-Image Outputs and User Perceptions

Saharsh Barve, Andy Mao, Jiayue Melissa Shi et al.

Recent advances in generative AI have enabled visual content creation through text-to-image (T2I) generation. However, despite their creative potential, T2I models often replicate and amplify societal stereotypes -- particularly those related to gender, race, and culture -- raising important ethical concerns. This paper proposes a theory-driven bias detection rubric and a Social Stereotype Index (SSI) to systematically evaluate social biases in T2I outputs. We audited three major T2I model outputs -- DALL-E-3, Midjourney-6.1, and Stability AI Core -- using 100 queries across three categories -- geocultural, occupational, and adjectival. Our analysis reveals that initial outputs are prone to include stereotypical visual cues, including gendered professions, cultural markers, and western beauty norms. To address this, we adopted our rubric to conduct targeted prompt refinement using LLMs, which significantly reduced bias -- SSI dropped by 61% for geocultural, 69% for occupational, and 51% for adjectival queries. We complemented our quantitative analysis through a user study examining perceptions, awareness, and preferences around AI-generated biased imagery. Our findings reveal a key tension -- although prompt refinement can mitigate stereotypes, it can limit contextual alignment. Interestingly, users often perceived stereotypical images to be more aligned with their expectations. We discuss the need to balance ethical debiasing with contextual relevance and call for T2I systems that support global diversity and inclusivity while not compromising the reflection of real-world social complexity.

HCFeb 5, 2022
Algorithmic nudge to make better choices: Evaluating effectiveness of XAI frameworks to reveal biases in algorithmic decision making to users

Prerna Juneja, Tanushree Mitra

In this position paper, we propose the use of existing XAI frameworks to design interventions in scenarios where algorithms expose users to problematic content (e.g. anti vaccine videos). Our intervention design includes facts (to indicate algorithmic justification of what happened) accompanied with either fore warnings or counterfactual explanations. While fore warnings indicate potential risks of an action to users, the counterfactual explanations will indicate what actions user should perform to change the algorithmic outcome. We envision the use of such interventions as `decision aids' to users which will help them make informed choices.

HCJan 21, 2021
Auditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation

Prerna Juneja, Tanushree Mitra

There is a growing concern that e-commerce platforms are amplifying vaccine-misinformation. To investigate, we conduct two-sets of algorithmic audits for vaccine misinformation on the search and recommendation algorithms of Amazon -- world's leading e-retailer. First, we systematically audit search-results belonging to vaccine-related search-queries without logging into the platform -- unpersonalized audits. We find 10.47% of search-results promote misinformative health products. We also observe ranking-bias, with Amazon ranking misinformative search-results higher than debunking search-results. Next, we analyze the effects of personalization due to account-history, where history is built progressively by performing various real-world user-actions, such as clicking a product. We find evidence of filter-bubble effect in Amazon's recommendations; accounts performing actions on misinformative products are presented with more misinformation compared to accounts performing actions on neutral and debunking products. Interestingly, once user clicks on a misinformative product, homepage recommendations become more contaminated compared to when user shows an intention to buy that product.