AIJun 3
Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM JudgesSrimonti Dutta, Akshata Kishore Moharir
LLM-as-judge evaluation is widely used in benchmarking pipelines, where model outputs are compared and ranked using automated evaluators. These pipelines typically assume that judgments are stable properties of fixed inputs. We show that this assumption does not hold under interaction. We study post-decision manipulability: the extent to which an evaluation outcome can be altered through subsequent conversation with the judge after an initial decision has been made. Across controlled experiments on MT-Bench and AlpacaEval, we find that LLM judges are highly stable under repeated and neutral reevaluation, yet become substantially reversible under targeted post-decision challenge. An anti-baseline challenge protocol shows that stable judgments can be overturned through motivated interaction, while a counterbalanced target-validation protocol separates this reversibility from net target-directed steering. These reversals have practical consequences: they can degrade agreement with human preferences, shift benchmark rankings, and produce harmful evaluation changes despite high self-reported confidence. Authority framing is especially destabilizing, and revised judgments are often accompanied by low-overlap justifications, suggesting post hoc rationalization rather than reliable error correction. We introduce the Evaluation Robustness Score (ERS) to quantify interactional robustness by combining reversal susceptibility with counterbalanced directional effects. Our findings identify post-decision interaction as a distinct failure mode for LLM-as-judge evaluation and motivate evaluation protocols that measure not only static agreement, but robustness under challenge.
CLMar 23
Designing Explainable Conversational Agentic Systems for Guaranà SpeakersSamantha Adorno, Akshata Kishore Moharir, Ratna Kandala · microsoft-research
Although artificial intelligence (AI) and Human-Computer Interaction (HCI) systems are often presented as universal solutions, their design remains predominantly text-first, underserving primarily oral languages and indigenous communities. This position paper uses GuaranÃ, an official and widely spoken language of Paraguay, as a case study to argue that language support in AI remains insufficient unless it aligns with lived oral practices. We propose an alternative to the standard "text-to-speech" pipeline, proposing instead an oral-first multi-agent architecture. By decoupling Guaranà natural language understanding from dedicated agents for conversation state and community-led governance, we demonstrate a technical framework that respects indigenous data sovereignty and diglossia. Our work moves beyond mere recognition to focus on turn-taking, repair, and shared context as the primary locus of interaction. We conclude that for AI to be truly culturally grounded, it must shift from adapting oral languages to text-centric systems to treating spoken conversation as a first-class design requirement, ensuring digital ecosystems empower rather than overlook diverse linguistic practices.
CVMar 20
The Nonverbal Gap: Toward Affective Computer Vision for Safer and More Equitable Online DatingRatna Kandala, Niva Manchanda, Akshata Kishore Moharir · microsoft-research
Online dating has become the dominant way romantic relationships begin, yet current platforms strip the nonverbal cues: gaze, facial expression, body posture, response timing, that humans rely on to signal comfort, disinterest, and consent, creating a communication gap with disproportionate safety consequences for women. We argue that this gap represents both a technical opportunity and a moral responsibility for the computer vision community, which has developed the affective tools, facial action unit detection, gaze estimation, engagement modeling, and multimodal affect recognition, needed to begin addressing it, yet has largely ignored the dating domain as a research context. We propose a fairness-first research agenda organized around four capability areas: real-time discomfort detection, engagement asymmetry modeling between partners, consent-aware interaction design, and longitudinal interaction summarization, each grounded in established CV methodology and motivated by the social psychology of romantic communication. We argue that responsible pursuit of this agenda requires purpose-built datasets collected under dyadic consent protocols, fairness evaluation disaggregated across race, gender identity, neurotype, and cultural background, and architectural commitments to on-device processing that prevent affective data from becoming platform surveillance infrastructure. This vision paper calls on the WICV community, whose members are uniquely positioned to understand both the technical opportunity and the human stakes, to establish online dating safety as a first-class research domain before commercial deployment outpaces ethical deliberation.
HCMar 5
"What if she doesn't feel the same?" What Happens When We Ask AI for Relationship AdviceNiva Manchanda, Akshata Kishore Moharir, Ratna Kandala · microsoft-research
Large Language Models (LLMs) are increasingly being used to provide support and advice in personal domains such as romantic relationships, yet little is known about user perceptions of this type of advice. This study investigated how people evaluate advice on LLM-generated romantic relationships. Participants rated advice satisfaction, model reliability, and helpfulness, and completed pre- and post-measures of their general attitudes toward LLMs. Overall, the results showed participants' high satisfaction with LLM-generated advice. Greater satisfaction was, in turn, strongly and positively associated with their perceptions of the models' reliability and helpfulness. Importantly, participants' attitudes toward LLMs improved significantly after exposure to the advice, suggesting that supportive and contextually relevant advice can enhance users' trust and openness toward these AI systems.
CLNov 12, 2025
Equilibrium Dynamics and Mitigation of Gender Bias in Synthetically Generated DataAshish Kattamuri, Arpita Vats, Harshwardhan Fartale et al.
Recursive prompting with large language models enables scalable synthetic dataset generation but introduces the risk of bias amplification. We investigate gender bias dynamics across three generations of recursive text generation using three complementary evaluation frameworks: rule-based pattern matching, embedding-based semantic similarity, and downstream task performance. Experiments with three initial bias levels (0.1, 0.3, 0.6) and four mitigation strategies reveal equilibrium dynamics rather than monotonic amplification. The low initial bias amplifies toward the model's inherent bias level (+36%), whereas the high initial bias decays toward it (-26%). Among mitigation methods, contrastive augmentation, which introduces gender-swapped variants, achieves significant downstream bias reduction (98.8% for low initial bias and 91% on average) despite producing higher embedding-based bias scores. This paradox demonstrates that semantic similarity metrics may diverge from behavioral fairness outcomes, highlighting the need for multidimensional evaluation in responsible synthetic data generation.
AIMar 5
EchoGuard: An Agentic Framework with Knowledge-Graph Memory for Detecting Manipulative Communication in Longitudinal DialogueRatna Kandala, Niva Manchanda, Akshata Kishore Moharir et al.
Manipulative communication, such as gaslighting, guilt-tripping, and emotional coercion, is often difficult for individuals to recognize. Existing agentic AI systems lack the structured, longitudinal memory to track these subtle, context-dependent tactics, often failing due to limited context windows and catastrophic forgetting. We introduce EchoGuard, an agentic AI framework that addresses this gap by using a Knowledge Graph (KG) as the agent's core episodic and semantic memory. EchoGuard employs a structured Log-Analyze-Reflect loop: (1) users log interactions, which the agent structures as nodes and edges in a personal, episodic KG (capturing events, emotions, and speakers); (2) the system executes complex graph queries to detect six psychologically-grounded manipulation patterns (stored as a semantic KG); and (3) an LLM generates targeted Socratic prompts grounded by the subgraph of detected patterns, guiding users toward self-discovery. This framework demonstrates how the interplay between agentic architectures and Knowledge Graphs can empower individuals in recognizing manipulative communication while maintaining personal autonomy and safety. We present the theoretical foundation, framework design, a comprehensive evaluation strategy, and a vision to validate this approach.
CLOct 10, 2025
From Explainability to Action: A Generative Operational Framework for Integrating XAI in Clinical Mental Health ScreeningRatna Kandala, Akshata Kishore Moharir, Divya Arvinda Nayak
Explainable Artificial Intelligence (XAI) has been presented as the critical component for unlocking the potential of machine learning in mental health screening (MHS). However, a persistent lab-to-clinic gap remains. Current XAI techniques, such as SHAP and LIME, excel at producing technically faithful outputs such as feature importance scores, but fail to deliver clinically relevant, actionable insights that can be used by clinicians or understood by patients. This disconnect between technical transparency and human utility is the primary barrier to real-world adoption. This paper argues that this gap is a translation problem and proposes the Generative Operational Framework, a novel system architecture that leverages Large Language Models (LLMs) as a central translation engine. This framework is designed to ingest the raw, technical outputs from diverse XAI tools and synthesize them with clinical guidelines (via RAG) to automatically generate human-readable, evidence-backed clinical narratives. To justify our solution, we provide a systematic analysis of the components it integrates, tracing the evolution from intrinsic models to generative XAI. We demonstrate how this framework directly addresses key operational barriers, including workflow integration, bias mitigation, and stakeholder-specific communication. This paper also provides a strategic roadmap for moving the field beyond the generation of isolated data points toward the delivery of integrated, actionable, and trustworthy AI in clinical practice.
CLOct 6, 2025
Cross-Lingual Mental Health Ontologies for Indian Languages: Bridging Patient Expression and Clinical Understanding through Explainable AI and Human-in-the-Loop ValidationAnanth Kandala, Ratna Kandala, Akshata Kishore Moharir et al.
Mental health communication in India is linguistically fragmented, culturally diverse, and often underrepresented in clinical NLP. Current health ontologies and mental health resources are dominated by diagnostic frameworks centered on English or Western culture, leaving a gap in representing patient distress expressions in Indian languages. We propose cross-linguistic graphs of patient stress expressions (CL-PDE), a framework for building cross-lingual mental health ontologies through graph-based methods that capture culturally embedded expressions of distress, align them across languages, and link them with clinical terminology. Our approach addresses critical gaps in healthcare communication by grounding AI systems in culturally valid representations, allowing more inclusive and patient-centric NLP tools for mental health care in multilingual contexts.
LGOct 3, 2025
Disentangling Recall and Reasoning in Transformer Models through Layer-wise Attention and Activation AnalysisHarshwardhan Fartale, Ashish Kattamuri, Rahul Raja et al.
Transformer-based language models excel at both recall (retrieving memorized facts) and reasoning (performing multi-step inference), but whether these abilities rely on distinct internal mechanisms remains unclear. Distinguishing recall from reasoning is crucial for predicting model generalization, designing targeted evaluations, and building safer interventions that affect one ability without disrupting the other.We approach this question through mechanistic interpretability, using controlled datasets of synthetic linguistic puzzles to probe transformer models at the layer, head, and neuron level. Our pipeline combines activation patching and structured ablations to causally measure component contributions to each task type. Across two model families (Qwen and LLaMA), we find that interventions on distinct layers and attention heads lead to selective impairments: disabling identified "recall circuits" reduces fact-retrieval accuracy by up to 15\% while leaving reasoning intact, whereas disabling "reasoning circuits" reduces multi-step inference by a comparable margin. At the neuron level, we observe task-specific firing patterns, though these effects are less robust, consistent with neuronal polysemanticity.Our results provide the first causal evidence that recall and reasoning rely on separable but interacting circuits in transformer models. These findings advance mechanistic interpretability by linking circuit-level structure to functional specialization and demonstrate how controlled datasets and causal interventions can yield mechanistic insights into model cognition, informing safer deployment of large language models.
LGOct 14, 2024
Time Series Viewmakers for Robust Disruption PredictionDhruva Chayapathy, Tavis Siebert, Lucas Spangher et al.
Machine Learning guided data augmentation may support the development of technologies in the physical sciences, such as nuclear fusion tokamaks. Here we endeavor to study the problem of detecting disruptions i.e. plasma instabilities that can cause significant damages, impairing the reliability and efficiency required for their real world viability. Machine learning (ML) prediction models have shown promise in detecting disruptions for specific tokamaks, but they often struggle in generalizing to the diverse characteristics and dynamics of different machines. This limits the effectiveness of ML models across different tokamak designs and operating conditions, which is a critical barrier to scaling fusion technology. Given the success of data augmentation in improving model robustness and generalizability in other fields, this study explores the use of a novel time series viewmaker network to generate diverse augmentations or "views" of training data. Our results show that incorporating views during training improves AUC and F2 scores on DisruptionBench tasks compared to standard or no augmentations. This approach represents a promising step towards developing more broadly applicable ML models for disruption avoidance, which is essential for advancing fusion technology and, ultimately, addressing climate change through reliable and sustainable energy production.