Sanorita Dey

HC
h-index46
4papers
9citations
Novelty39%
AI Score39

4 Papers

60.6HCMar 24
ReflectEd: Evaluating Reflection-Driven Learning in an AI-Assisted System

Md Nazmus Sakib, Ishika Tarin, Naga Manogna Rayasam et al.

In collaborative settings, sustaining momentum and engagement between checkpoints (e.g., meetings) can be challenging, often leading to task drift and reduced preparedness. To address this gap, we developed ReflectEd, an AI-assisted system that supports between-checkpoint reflection through theory-driven prompts with progressively structured levels and mechanism-based scaffolding. We evaluated ReflectEd in a mixed-method study comparing two reflection configurations: a regular reflection workflow and a deeper reflection workflow that included an additional transformative reflection activity. Across conditions, participants reported steady engagement early in the week. In the deeper configuration, later reflections tended to exhibit higher actionability and richer forward-looking planning, while also being harder to sustain and more effortful during periods of active work. Partner-visible reflections were frequently described as supporting coordination by surfacing differences in focus and facilitating accountability. Overall, the findings characterize trade-offs between reflection depth, feasibility, and perceived preparedness for subsequent checkpoints. We discuss implications for the design of AI-assisted systems that support collaboration readiness and reflection-oriented regulation in time-constrained collaborative workflows.

20.8HCMar 22
Expecting Too Much, Getting Too Little: Exploring the Challenges and Design Opportunities of Asynchronous AI Interviewers

Md Nazmus Sakib, Naga Manogna Rayasam, Sanorita Dey

Organizations use asynchronous AI interview systems to efficiently manage large applicant pools, enabling quick and uniform evaluations. However, concerns remain about their impact on user agency and the lack of personalization applicants experience with these systems. Although efforts have been made to humanize the interview process, users' expectations are often unmet, especially when compared to the promises made by these systems. To examine how applicants perceive and experience these tools, particularly in the context of their growing familiarity with large language models (LLMs), we conducted a two-phase study. The first phase involved an analysis of 11 subreddit discussions on interview experiences with asynchronous AI interviewers, followed by a semi-structured interview study with 17 participants. Qualitative analysis revealed key issues such as mismatched expectations, amplified by organizational rhetoric and applicant expectations shaped by experiences with LLMs. These factors shaped participants' sense of agency and trust, often leading to workarounds and deceptive practices. In the follow-up study, we designed an interface with two features, response variants and feedback variants, and evaluated it across six groups (N = 180, 30 participants each) to assess whether these features support users' sense of agency, competence, and relatedness. Our analysis suggests that even subtle design changes can enhance user autonomy and that carefully designed feedback can provide meaningful support in high-stakes interview contexts.

CLFeb 22, 2024
COBIAS: Assessing the Contextual Reliability of Bias Benchmarks for Language Models

Priyanshul Govil, Hemang Jain, Vamshi Krishna Bonagiri et al.

Large Language Models (LLMs) often inherit biases from the web data they are trained on, which contains stereotypes and prejudices. Current methods for evaluating and mitigating these biases rely on bias-benchmark datasets. These benchmarks measure bias by observing an LLM's behavior on biased statements. However, these statements lack contextual considerations of the situations they try to present. To address this, we introduce a contextual reliability framework, which evaluates model robustness to biased statements by considering the various contexts in which they may appear. We develop the Context-Oriented Bias Indicator and Assessment Score (COBIAS) to measure a biased statement's reliability in detecting bias, based on the variance in model behavior across different contexts. To evaluate the metric, we augmented 2,291 stereotyped statements from two existing benchmark datasets by adding contextual information. We show that COBIAS aligns with human judgment on the contextual reliability of biased statements (Spearman's $ρ= 0.65, p = 3.4 * 10^{-60}$) and can be used to create reliable benchmarks, which would assist bias mitigation works.

CVJul 28, 2025
Analyzing the Sensitivity of Vision Language Models in Visual Question Answering

Monika Shah, Sudarshan Balaji, Somdeb Sarkhel et al.

We can think of Visual Question Answering as a (multimodal) conversation between a human and an AI system. Here, we explore the sensitivity of Vision Language Models (VLMs) through the lens of cooperative principles of conversation proposed by Grice. Specifically, even when Grice's maxims of conversation are flouted, humans typically do not have much difficulty in understanding the conversation even though it requires more cognitive effort. Here, we study if VLMs are capable of handling violations to Grice's maxims in a manner that is similar to humans. Specifically, we add modifiers to human-crafted questions and analyze the response of VLMs to these modifiers. We use three state-of-the-art VLMs in our study, namely, GPT-4o, Claude-3.5-Sonnet and Gemini-1.5-Flash on questions from the VQA v2.0 dataset. Our initial results seem to indicate that the performance of VLMs consistently diminish with the addition of modifiers which indicates our approach as a promising direction to understand the limitations of VLMs.