Emil Joswin

3papers

3 Papers

22.3CLJul 1

A Mechanistic View of Authority Hierarchy in LLM Sycophancy

Emil Joswin, Srujananjali Medicherla, Priyanka Mary Mammen

Authority bias poses a critical safety concern in language models: models systematically prioritize social cues from authority figures over factual consistency, swaying their answers based on source credibility rather than evidence. We mechanistically investigate this phenomenon using a controlled medical QA setting, where hints suggesting incorrect answers are attributed to personas of varying expertise. Across Llama-3.1-8B, Qwen3-8B, and Gemma-2-9B, we find that models respond in a graded manner proportional to perceived authority, a hierarchy that is never explicitly prompted but emerges from training. Logit lens analysis and linear/non-linear probing localize this effect to a critical late layer where correct answer representations are actively erased, an erasure that scales with authority level, resists mean vector intervention, and is only partially reversible through chain-of-thought reasoning. Our findings suggest that authority-induced sycophancy is not a surface-level output bias but mechanistic knowledge erasure, a precise, layer-localized overwriting of correct internal representations by high-status authority signals.

1.1CLJan 19

Trust Me, I'm an Expert: Decoding and Steering Authority Bias in Large Language Models

Priyanka Mary Mammen, Emil Joswin, Shankar Venkitachalam

Prior research demonstrates that performance of language models on reasoning tasks can be influenced by suggestions, hints and endorsements. However, the influence of endorsement source credibility remains underexplored. We investigate whether language models exhibit systematic bias based on the perceived expertise of the provider of the endorsement. Across 4 datasets spanning mathematical, legal, and medical reasoning, we evaluate 11 models using personas representing four expertise levels per domain. Our results reveal that models are increasingly susceptible to incorrect/misleading endorsements as source expertise increases, with higher-authority sources inducing not only accuracy degradation but also increased confidence in wrong answers. We also show that this authority bias is mechanistically encoded within the model and a model can be steered away from the bias, thereby improving its performance even when an expert gives a misleading endorsement.

3.1HCOct 2, 2019

Brown Ring Experiment in Virtual Reality

Prithaj Jana, Emil Joswin

Brown Ring Experiment is a very popular test to detect the presence of Nitrate in salts commonly performed in chemical laboratories with supplies of required chemicals. Our work clears out the need for a chemical laboratory and chemicals in order to understand the experiment practically. We have used the technology of Virtual Reality to fulfill this requirement. Our research work can be extensively utilized to create virtual environments for conducting other chemical processes in a virtual environment hence, eliminating the need for a chemical laboratory. This can help students in remote areas with minimal resources to fill in the void of practical experiments they have in their learning process due to space constraints.