CLOct 20, 2024
KTCR: Improving Implicit Hate Detection with Knowledge Transfer driven Concept RefinementSamarth Garg, Vivek Hruday Kavuri, Gargi Shroff et al.
The constant shifts in social and political contexts, driven by emerging social movements and political events, lead to new forms of hate content and previously unrecognized hate patterns that machine learning models may not have captured. Some recent literature proposes data augmentation-based techniques to enrich existing hate datasets by incorporating samples that reveal new implicit hate patterns. This approach aims to improve the model's performance on out-of-domain implicit hate instances. It is observed, that further addition of more samples for augmentation results in the decrease of the performance of the model. In this work, we propose a Knowledge Transfer-driven Concept Refinement method that distills and refines the concepts related to implicit hate samples through novel prototype alignment and concept losses, alongside data augmentation based on concept activation vectors. Experiments with several publicly available datasets show that incorporating additional implicit samples reflecting new hate patterns through concept refinement enhances the model's performance, surpassing baseline results while maintaining cross-dataset generalization capabilities.
CVAug 10, 2025
Freeze and Reveal: Exposing Modality Bias in Vision-Language ModelsVivek Hruday Kavuri, Vysishtya Karanam, Venkata Jahnavi Venkamsetty et al.
Vision Language Models achieve impressive multi-modal performance but often inherit gender biases from their training data. This bias might be coming from both the vision and text modalities. In this work, we dissect the contributions of vision and text backbones to these biases by applying targeted debiasing using Counterfactual Data Augmentation and Task Vector methods. Inspired by data-efficient approaches in hate-speech classification, we introduce a novel metric, Degree of Stereotypicality and a corresponding debiasing method, Data Augmentation Using Degree of Stereotypicality - DAUDoS, to reduce bias with minimal computational cost. We curate a gender annotated dataset and evaluate all methods on VisoGender benchmark to quantify improvements and identify dominant source of bias. Our results show that CDA reduces the gender gap by 6% and DAUDoS by 3% but using only one-third of the data. Both methods also improve the model's ability to correctly identify gender in images by 3%, with DAUDoS achieving this improvement using only almost one-third of training data. From our experiment's, we observed that CLIP's vision encoder is more biased whereas PaliGemma2's text encoder is more biased. By identifying whether bias stems more from vision or text encoders, our work enables more targeted and effective bias mitigation strategies in future multi-modal systems.
CLFeb 21
Think$^{2}$: Grounded Metacognitive Reasoning in Large Language ModelsAbraham Paul Elenjical, Vivek Hruday Kavuri, Vasudeva Varma
Large Language Models (LLMs) demonstrate strong reasoning performance, yet their ability to reliably monitor, diagnose, and correct their own errors remains limited. We introduce a psychologically grounded metacognitive framework that operationalizes Ann Brown's regulatory cycle (Planning, Monitoring, and Evaluation) as a structured prompting architecture, and study its integration within a lightweight dual-process MetaController for adaptive effort allocation. Across diverse reasoning and diagnostic benchmarks (GSM8K, CRUXEval, MBPP, AIME, CorrectBench, and TruthfulQA) using Llama-3 and Qwen-3 (8B), explicit regulatory structuring substantially improves error diagnosis and yields a threefold increase in successful self-correction. Blinded human evaluations over 580 query pairs show an 84% aggregate preference for trustworthiness and metacognitive self-awareness over standard and Chain-of-Thought baselines. Grounding LLM reasoning in established cognitive theory offers a principled path toward more transparent and diagnostically robust AI systems.
CLAug 21, 2025
NEAT: Concept driven Neuron Attribution in LLMsVivek Hruday Kavuri, Gargi Shroff, Rahul Mishra
Locating neurons that are responsible for final predictions is important for opening the black-box large language models and understanding the inside mechanisms. Previous studies have tried to find mechanisms that operate at the neuron level but these methods fail to represent a concept and there is also scope for further optimization of compute required. In this paper, with the help of concept vectors, we propose a method for locating significant neurons that are responsible for representing certain concepts and term those neurons as concept neurons. If the number of neurons is n and the number of examples is m, we reduce the number of forward passes required from O(n*m) to just O(n) compared to the previous works and hence optimizing the time and computation required over previous works. We also compare our method with several baselines and previous methods and our results demonstrate better performance than most of the methods and are more optimal when compared to the state-of-the-art method. We, as part of our ablation studies, also try to optimize the search for the concept neurons by involving clustering methods. Finally, we apply our methods to find, turn off the neurons that we find, and analyze its implications in parts of hate speech and bias in LLMs, and we also evaluate our bias part in terms of Indian context. Our methodology, analysis and explanations facilitate understating of neuron-level responsibility for more broader and human-like concepts and also lay a path for future research in this direction of finding concept neurons and intervening them.