Animesh Mukherjee

CL
h-index22
105papers
11,214citations
Novelty39%
AI Score58

105 Papers

CLMar 4Code
Benchmarking Motivational Interviewing Competence of Large Language Models

Aishwariya Jha, Prakrithi Shivaprakash, Lekhansh Shukla et al.

Motivational interviewing (MI) promotes behavioural change in substance use disorders. Its fidelity is measured using the Motivational Interviewing Treatment Integrity (MITI) framework. While large language models (LLMs) can potentially generate MI-consistent therapist responses, their competence using MITI is not well-researched, especially in real world clinical transcripts. We aim to benchmark MI competence of proprietary and open-source models compared to human therapists in real-world transcripts and assess distinguishability from human therapists. Methods: We shortlisted 3 proprietary and 7 open-source LLMs from LMArena, evaluated performance using MITI 4.2 framework on two datasets (96 handcrafted model transcripts, 34 real-world clinical transcripts). We generated parallel LLM-therapist utterances iteratively for each transcript while keeping client responses static, and ranked performance using a composite ranking system with MITI components and verbosity. We conducted a distinguishability experiment with two independent psychiatrists to identify human-vs-LLM responses. Results: All 10 tested LLMs had fair (MITI global scores >3.5) to good (MITI global scores >4) competence across MITI measures, and three best-performing models (gemma-3-27b-it, gemini-2.5-pro, grok-3) were tested on real-world transcripts. All showed good competence, with LLMs outperforming human-expert in Complex Reflection percentage (39% vs 96%) and Reflection-Question ratio (1.2 vs >2.8). In the distinguishability experiment, psychiatrists identified LLM responses with only 56% accuracy, with d-prime: 0.17 and 0.25 for gemini-2.5-pro and gemma-3-27b-it respectively. Conclusion: LLMs can achieve good MI proficiency in real-world clinical transcripts using MITI framework. These findings suggest that even open-source LLMs are viable candidates for expanding MI counselling sessions in low-resource settings.

SIMar 18, 2023
On the rise of fear speech in online social media

Punyajoy Saha, Kiran Garimella, Narla Komal Kalyan et al.

Recently, social media platforms are heavily moderated to prevent the spread of online hate speech, which is usually fertile in toxic words and is directed toward an individual or a community. Owing to such heavy moderation, newer and more subtle techniques are being deployed. One of the most striking among these is fear speech. Fear speech, as the name suggests, attempts to incite fear about a target community. Although subtle, it might be highly effective, often pushing communities toward a physical conflict. Therefore, understanding their prevalence in social media is of paramount importance. This article presents a large-scale study to understand the prevalence of 400K fear speech and over 700K hate speech posts collected from Gab.com. Remarkably, users posting a large number of fear speech accrue more followers and occupy more central positions in social networks than users posting a large number of hate speech. They can also reach out to benign users more effectively than hate speech users through replies, reposts, and mentions. This connects to the fact that, unlike hate speech, fear speech has almost zero toxic content, making it look plausible. Moreover, while fear speech topics mostly portray a community as a perpetrator using a (fake) chain of argumentation, hate speech topics hurl direct multitarget insults, thus pointing to why general users could be more gullible to fear speech. Our findings transcend even to other platforms (Twitter and Facebook) and thus necessitate using sophisticated moderation policies and mass awareness to combat fear speech.

CVJul 22, 2024Code
Breaking the Global North Stereotype: A Global South-centric Benchmark Dataset for Auditing and Mitigating Biases in Facial Recognition Systems

Siddharth D Jaiswal, Animesh Ganai, Abhisek Dash et al.

Facial Recognition Systems (FRSs) are being developed and deployed globally at unprecedented rates. Most platforms are designed in a limited set of countries but deployed in worldwide, without adequate checkpoints. This is especially problematic for Global South countries which lack strong legislation to safeguard persons facing disparate performance of these systems. A combination of unavailability of datasets, lack of understanding of FRS functionality and low-resource bias mitigation measures accentuate the problem. In this work, we propose a new face dataset composed of 6,579 unique male and female sportspersons from eight countries around the world. More than 50% of the dataset comprises individuals from the Global South countries and is demographically diverse. To aid adversarial audits and robust model training, each image has four adversarial variants, totaling over 40,000 images. We also benchmark five popular FRSs, both commercial and open-source, for the task of gender prediction (and country prediction for one of the open-source models as an example of red-teaming). Experiments on industrial FRSs reveal accuracies ranging from 98.2%--38.1%, with a large disparity between males and females in the Global South (max difference of 38.5%). Biases are also observed in all FRSs between females of the Global North and South (max difference of ~50%). Grad-CAM analysis identifies the nose, forehead and mouth as the regions of interest on one of the open-source FRSs. Utilizing this insight, we design simple, low-resource bias mitigation solutions using few-shot and novel contrastive learning techniques significantly improving the accuracy with disparity between males and females reducing from 50% to 1.5% in one of the settings. In the red-teaming experiment with the open-source Deepface model, contrastive learning proves more effective than simple fine-tuning.

91.2CLMay 29
FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection

Paramananda Bhaskar, Naquee Rizwan, Daksh Jogchand et al.

Hateful meme detection remains a formidable challenge for vision-language models, as existing benchmarks are structurally observational - confounding rhetorical hate mechanisms with target community features and preventing causal evaluation of model vulnerabilities. To address this, we introduce FBHM, a systematically curated benchmark of Functionality Based Hateful Memes constructed along two orthogonal axes: 25 distinct rhetorical functionalities and 10 target communities (5,000 memes total). Benchmarking state-of-the-art VLMs reveals a severe generalization gap: models highly accurate on standard datasets catastrophically drop to near-random performance on FBHM, proving they exploit dataset-specific heuristics rather than robust multimodal reasoning. To efficiently close this gap, we propose LSV (learnable steering vectors), an ultra-low data regime strategy that applies a causal intervention objective on as few as 500 steering samples (50 unique base memes), boosting FBHM performance by ~30 Macro-F1 points while outperforming in-context learning and PEFT without degrading source-domain performance.

CLOct 19, 2023
Probing LLMs for hate speech detection: strengths and vulnerabilities

Sarthak Roy, Ashish Harshavardhan, Animesh Mukherjee et al.

Recently efforts have been made by social media platforms as well as researchers to detect hateful or toxic language using large language models. However, none of these works aim to use explanation, additional context and victim community information in the detection process. We utilise different prompt variation, input information and evaluate large language models in zero shot setting (without adding any in-context examples). We select three large language models (GPT-3.5, text-davinci and Flan-T5) and three datasets - HateXplain, implicit hate and ToxicSpans. We find that on average including the target information in the pipeline improves the model performance substantially (~20-30%) over the baseline across the datasets. There is also a considerable effect of adding the rationales/explanations into the pipeline (~10-20%) over the baseline across the datasets. In addition, we further provide a typology of the error cases where these large language models fail to (i) classify and (ii) explain the reason for the decisions they take. Such vulnerable points automatically constitute 'jailbreak' prompts for these models and industry scale safeguard techniques need to be developed to make the models robust against such prompts.

CLApr 26, 2022
Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages

Mithun Das, Somnath Banerjee, Animesh Mukherjee

Abusive language is a growing concern in many social media platforms. Repeated exposure to abusive speech has created physiological effects on the target users. Thus, the problem of abusive language should be addressed in all forms for online peace and safety. While extensive research exists in abusive speech detection, most studies focus on English. Recently, many smearing incidents have occurred in India, which provoked diverse forms of abusive speech in online space in various languages based on the geographic location. Therefore it is essential to deal with such malicious content. In this paper, to bridge the gap, we demonstrate a large-scale analysis of multilingual abusive speech in Indic languages. We examine different interlingual transfer mechanisms and observe the performance of various multilingual models for abusive speech detection for eight different Indic languages. We also experiment to show how robust these models are on adversarial attacks. Finally, we conduct an in-depth error analysis by looking into the models' misclassified posts across various settings. We have made our code and models public for other researchers.

CLOct 7, 2022
Hate Speech and Offensive Language Detection in Bengali

Mithun Das, Somnath Banerjee, Punyajoy Saha et al.

Social media often serves as a breeding ground for various hateful and offensive content. Identifying such content on social media is crucial due to its impact on the race, gender, or religion in an unprejudiced society. However, while there is extensive research in hate speech detection in English, there is a gap in hateful content detection in low-resource languages like Bengali. Besides, a current trend on social media is the use of Romanized Bengali for regular interactions. To overcome the existing research's limitations, in this study, we develop an annotated dataset of 10K Bengali posts consisting of 5K actual and 5K Romanized Bengali tweets. We implement several baseline models for the classification of such hateful posts. We further explore the interlingual transfer mechanism to boost classification performance. Finally, we perform an in-depth error analysis by looking into the misclassified posts by the models. While training actual and Romanized datasets separately, we observe that XLM-Roberta performs the best. Further, we witness that on joint training and few-shot training, MuRIL outperforms other models by interpreting the semantic expressions better. We make our code and dataset public for others.

CLMay 9, 2022
CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech

Punyajoy Saha, Kanishk Singh, Adarsh Kumar et al.

Recently, many studies have tried to create generation models to assist counter speakers by providing counterspeech suggestions for combating the explosive proliferation of online hate. However, since these suggestions are from a vanilla generation model, they might not include the appropriate properties required to counter a particular hate speech instance. In this paper, we propose CounterGeDi - an ensemble of generative discriminators (GeDi) to guide the generation of a DialoGPT model toward more polite, detoxified, and emotionally laden counterspeech. We generate counterspeech using three datasets and observe significant improvement across different attribute scores. The politeness and detoxification scores increased by around 15% and 6% respectively, while the emotion in the counterspeech increased by at least 10% across all the datasets. We also experiment with triple-attribute control and observe significant improvement over single attribute results when combining complementing attributes, e.g., politeness, joyfulness and detoxification. In all these experiments, the relevancy of the generated text does not deteriorate due to the application of these controls

CLSep 11, 2023
Personality Detection and Analysis using Twitter Data

Abhilash Datta, Souvic Chakraborty, Animesh Mukherjee

Personality types are important in various fields as they hold relevant information about the characteristics of a human being in an explainable format. They are often good predictors of a person's behaviors in a particular environment and have applications ranging from candidate selection to marketing and mental health. Recently automatic detection of personality traits from texts has gained significant attention in computational linguistics. Most personality detection and analysis methods have focused on small datasets making their experimental observations often limited. To bridge this gap, we focus on collecting and releasing the largest automatically curated dataset for the research community which has 152 million tweets and 56 thousand data points for the Myers-Briggs personality type (MBTI) prediction task. We perform a series of extensive qualitative and quantitative studies on our dataset to analyze the data patterns in a better way and infer conclusions. We show how our intriguing analysis results often follow natural intuition. We also perform a series of ablation studies to show how the baselines perform for our dataset.

LGFeb 27, 2023
Diversity matters: Robustness of bias measurements in Wikidata

Paramita Das, Sai Keerthana Karnam, Anirban Panda et al.

With the widespread use of knowledge graphs (KG) in various automated AI systems and applications, it is very important to ensure that information retrieval algorithms leveraging them are free from societal biases. Previous works have depicted biases that persist in KGs, as well as employed several metrics for measuring the biases. However, such studies lack the systematic exploration of the sensitivity of the bias measurements, through varying sources of data, or the embedding algorithms used. To address this research gap, in this work, we present a holistic analysis of bias measurement on the knowledge graph. First, we attempt to reveal data biases that surface in Wikidata for thirteen different demographics selected from seven continents. Next, we attempt to unfold the variance in the detection of biases by two different knowledge graph embedding algorithms - TransE and ComplEx. We conduct our extensive experiments on a large number of occupations sampled from the thirteen demographics with respect to the sensitive attribute, i.e., gender. Our results show that the inherent data bias that persists in KG can be altered by specific algorithm bias as incorporated by KG embedding learning algorithms. Further, we show that the choice of the state-of-the-art KG embedding algorithm has a strong impact on the ranking of biased occupations irrespective of gender. We observe that the similarity of the biased occupations across demographics is minimal which reflects the socio-cultural differences around the globe. We believe that this full-scale audit of the bias measurement pipeline will raise awareness among the community while deriving insights related to design choices of data and algorithms both and refrain from the popular dogma of ``one-size-fits-all''.

53.0CYMar 20
Setting the Course, but Forgetting to Steer: Analyzing Compliance with GDPR's Right of Access to Data by Instagram, TikTok, and YouTube

Sai Keerthana Karnam, Abhisek Dash, Antariksh Das et al.

The GDPR's Right of Access aims to empower users with control over their personal data via Data Download Packages (DDPs). However, their effectiveness is often compromised by inconsistent platform implementations, questionable data reliability, and poor user comprehensibility. This paper conducts a comprehensive audit of DDPs from three social media platforms (TikTok, Instagram, and YouTube) to systematically assess these critical drawbacks. Despite offering similar services, we find that these platforms demonstrate significant inconsistencies in implementing the Right of Access, evident in varying levels of shared data. Critically, the failure to disclose processing purposes, retention periods, and other third-party data recipients serves as a further indicator of non-compliance. Our reliability evaluations, using bots and user-donated data, reveal that while TikTok's DDPs offer more consistent and complete data, others exhibit notable shortcomings. Similarly, our assessment of comprehensibility, based on surveys with 400 participants, indicates that current DDPs substantially fall short of GDPR's standards. To improve the comprehensibility, we propose and demonstrate a two-layered approach by: (1)~enhancing the data representation itself using stakeholder interpretations; and (2)~incorporating a user-friendly extension (\textit{Know Your Data}) for intuitive data visualization where users can control the level of transparency they prefer. Our findings underscore the need for clearer and non-conflicting regulatory guidance, stricter enforcement, and platform commitment to realize the goal of GDPR's Right of Access.

CYSep 21, 2022
Fast Few shot Self-attentive Semi-supervised Political Inclination Prediction

Souvic Chakraborty, Pawan Goyal, Animesh Mukherjee

With the rising participation of the common mass in social media, it is increasingly common now for policymakers/journalists to create online polls on social media to understand the political leanings of people in specific locations. The caveat here is that only influential people can make such an online polling and reach out at a mass scale. Further, in such cases, the distribution of voters is not controllable and may be, in fact, biased. On the other hand,if we can interpret the publicly available data over social media to probe the political inclination of users, we will be able to have controllable insights about the survey population, keep the cost of survey low and also collect publicly available data without involving the concerned persons. Hence we introduce a self-attentive semi-supervised framework for political inclination detection to further that objective. The advantage of our model is that it neither needs huge training data nor does it need to store social network parameters. Nevertheless, it achieves an accuracy of 93.7\% with no annotated data; further, with only a few annotated examples per class it achieves competitive performance. We found that the model is highly efficient even in resource-constrained settings, and insights drawn from its predictions match the manual survey outcomes when applied to diverse real-life scenarios.

CLApr 30, 2022
HateCheckHIn: Evaluating Hindi Hate Speech Detection Models

Mithun Das, Punyajoy Saha, Binny Mathew et al.

Due to the sheer volume of online hate, the AI and NLP communities have started building models to detect such hateful content. Recently, multilingual hate is a major emerging challenge for automated detection where code-mixing or more than one language have been used for conversation in social media. Typically, hate speech detection models are evaluated by measuring their performance on the held-out test data using metrics such as accuracy and F1-score. While these metrics are useful, it becomes difficult to identify using them where the model is failing, and how to resolve it. To enable more targeted diagnostic insights of such multilingual hate speech models, we introduce a set of functionalities for the purpose of evaluation. We have been inspired to design this kind of functionalities based on real-world conversation on social media. Considering Hindi as a base language, we craft test cases for each functionality. We name our evaluation dataset HateCheckHIn. To illustrate the utility of these functionalities , we test state-of-the-art transformer based m-BERT model and the Perspective API.

CVOct 18, 2023
BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification

Mithun Das, Animesh Mukherjee

The dramatic increase in the use of social media platforms for information sharing has also fueled a steep growth in online abuse. A simple yet effective way of abusing individuals or communities is by creating memes, which often integrate an image with a short piece of text layered on top of it. Such harmful elements are in rampant use and are a threat to online safety. Hence it is necessary to develop efficient models to detect and flag abusive memes. The problem becomes more challenging in a low-resource setting (e.g., Bengali memes, i.e., images with Bengali text embedded on it) because of the absence of benchmark datasets on which AI models could be trained. In this paper we bridge this gap by building a Bengali meme dataset. To setup an effective benchmark we implement several baseline models for classifying abusive memes using this dataset. We observe that multimodal models that use both textual and visual information outperform unimodal models. Our best-performing model achieves a macro F1 score of 70.51. Finally, we perform a qualitative error analysis of the misclassified memes of the best-performing text-based, image-based and multimodal models.

CLSep 5, 2022
"Dummy Grandpa, do you know anything?": Identifying and Characterizing Ad hominem Fallacy Usage in the Wild

Utkarsh Patel, Animesh Mukherjee, Mainack Mondal

Today, participating in discussions on online forums is extremely commonplace and these discussions have started rendering a strong influence on the overall opinion of online users. Naturally, twisting the flow of the argument can have a strong impact on the minds of naive users, which in the long run might have socio-political ramifications, for example, winning an election or spreading targeted misinformation. Thus, these platforms are potentially highly vulnerable to malicious players who might act individually or as a cohort to breed fallacious arguments with a motive to sway public opinion. Ad hominem arguments are one of the most effective forms of such fallacies. Although a simple fallacy, it is effective enough to sway public debates in offline world and can be used as a precursor to shutting down the voice of opposition by slander. In this work, we take a first step in shedding light on the usage of ad hominem fallacies in the wild. First, we build a powerful ad hominem detector with high accuracy (F1 more than 83%, showing a significant improvement over prior work), even for datasets for which annotated instances constitute a very small fraction. We then used our detector on 265k arguments collected from the online debate forum - CreateDebate. Our crowdsourced surveys validate our in-the-wild predictions on CreateDebate data (94% match with manual annotation). Our analysis revealed that a surprising 31.23% of CreateDebate content contains ad hominem fallacy, and a cohort of highly active users post significantly more ad hominem to suppress opposing views. Then, our temporal analysis revealed that ad hominem argument usage increased significantly since the 2016 US Presidential election, not only for topics like Politics, but also for Science and Law. We conclude by discussing important implications of our work to detect and defend against ad hominem fallacies.

SISep 12, 2023
Evaluating the Ebb and Flow: An In-depth Analysis of Question-Answering Trends across Diverse Platforms

Rima Hazra, Agnik Saha, Somnath Banerjee et al.

Community Question Answering (CQA) platforms steadily gain popularity as they provide users with fast responses to their queries. The swiftness of these responses is contingent on a mixture of query-specific and user-related elements. This paper scrutinizes these contributing factors within the context of six highly popular CQA platforms, identified through their standout answering speed. Our investigation reveals a correlation between the time taken to yield the first response to a question and several variables: the metadata, the formulation of the questions, and the level of interaction among users. Additionally, by employing conventional machine learning models to analyze these metadata and patterns of user interaction, we endeavor to predict which queries will receive their initial responses promptly.

84.9CLMar 26Code
SafeMath: Inference-time Safety improves Math Accuracy

Sagnik Basu, Subhrajit Mitra, Aman Juneja et al.

Recent research points toward LLMs being manipulated through adversarial and seemingly benign inputs, resulting in harmful, biased, or policy-violating outputs. In this paper, we study an underexplored issue concerning harmful and toxic mathematical word problems. We show that math questions, particularly those framed as natural language narratives, can serve as a subtle medium for propagating biased, unethical, or psychologically harmful content, with heightened risks in educational settings involving children. To support a systematic study of this phenomenon, we introduce ToxicGSM, a dataset of 1.9k arithmetic problems in which harmful or sensitive context is embedded while preserving mathematically well-defined reasoning tasks. Using this dataset, we audit the behaviour of existing LLMs and analyse the trade-offs between safety enforcement and mathematical correctness. We further propose SafeMath -- a safety alignment technique that reduces harmful outputs while maintaining, and in some cases improving, mathematical reasoning performance. Our results highlight the importance of disentangling linguistic harm from math reasoning and demonstrate that effective safety alignment need not come at the cost of accuracy. We release the source code and dataset at https://github.com/Swagnick99/SafeMath/tree/main.

CLFeb 11, 2023
HateProof: Are Hateful Meme Detection Systems really Robust?

Piush Aggarwal, Pranit Chawla, Mithun Das et al.

Exploiting social media to spread hate has tremendously increased over the years. Lately, multi-modal hateful content such as memes has drawn relatively more traction than uni-modal content. Moreover, the availability of implicit content payloads makes them fairly challenging to be detected by existing hateful meme detection systems. In this paper, we present a use case study to analyze such systems' vulnerabilities against external adversarial attacks. We find that even very simple perturbations in uni-modal and multi-modal settings performed by humans with little knowledge about the model can make the existing detection models highly vulnerable. Empirically, we find a noticeable performance drop of as high as 10% in the macro-F1 score for certain attacks. As a remedy, we attempt to boost the model's robustness using contrastive learning as well as an adversarial training-based method - VILLA. Using an ensemble of the above two approaches, in two of our high resolution datasets, we are able to (re)gain back the performance to a large extent for certain attacks. We believe that ours is a first step toward addressing this crucial problem in an adversarial setting and would inspire more such investigations in the future.

CLApr 13, 2022
CRUSH: Contextually Regularized and User anchored Self-supervised Hate speech Detection

Souvic Chakraborty, Parag Dutta, Sumegh Roychowdhury et al.

The last decade has witnessed a surge in the interaction of people through social networking platforms. While there are several positive aspects of these social platforms, the proliferation has led them to become the breeding ground for cyber-bullying and hate speech. Recent advances in NLP have often been used to mitigate the spread of such hateful content. Since the task of hate speech detection is usually applicable in the context of social networks, we introduce CRUSH, a framework for hate speech detection using user-anchored self-supervision and contextual regularization. Our proposed approach secures ~ 1-12% improvement in test set metrics over best performing previous approaches on two types of tasks and multiple popular english social media datasets.

MMMar 25, 2022
hate-alert@DravidianLangTech-ACL2022: Ensembling Multi-Modalities for Tamil TrollMeme Classification

Mithun Das, Somnath Banerjee, Animesh Mukherjee

Social media platforms often act as breeding grounds for various forms of trolling or malicious content targeting users or communities. One way of trolling users is by creating memes, which in most cases unites an image with a short piece of text embedded on top of it. The situation is more complex for multilingual(e.g., Tamil) memes due to the lack of benchmark datasets and models. We explore several models to detect Troll memes in Tamil based on the shared task, "Troll Meme Classification in DravidianLangTech2022" at ACL-2022. We observe while the text-based model MURIL performs better for Non-troll meme classification, the image-based model VGG16 performs better for Troll-meme classification. Further fusing these two modalities help us achieve stable outcomes in both classes. Our fusion model achieved a 0.561 weighted average F1 score and ranked second in this task.

CLNov 30, 2022
Rationale-Guided Few-Shot Classification to Detect Abusive Language

Punyajoy Saha, Divyanshu Sheth, Kushal Kedia et al.

Abusive language is a concerning problem in online social media. Past research on detecting abusive language covers different platforms, languages, demographies, etc. However, models trained using these datasets do not perform well in cross-domain evaluation settings. To overcome this, a common strategy is to use a few samples from the target domain to train models to get better performance in that domain (cross-domain few-shot training). However, this might cause the models to overfit the artefacts of those samples. A compelling solution could be to guide the models toward rationales, i.e., spans of text that justify the text's label. This method has been found to improve model performance in the in-domain setting across various NLP tasks. In this paper, we propose RGFS (Rationale-Guided Few-Shot Classification) for abusive language detection. We first build a multitask learning setup to jointly learn rationales, targets, and labels, and find a significant improvement of 6% macro F1 on the rationale detection task over training solely rationale classifiers. We introduce two rationale-integrated BERT-based architectures (the RGFS models) and evaluate our systems over five different abusive language datasets, finding that in the few-shot classification setting, RGFS-based models outperform baseline models by about 7% in macro F1 scores and perform competitively to models finetuned on other source domains. Furthermore, RGFS-based models outperform LIME/SHAP-based approaches in terms of plausibility and are close in performance in terms of faithfulness.

DLSep 19, 2023
Modeling interdisciplinary interactions among Physics, Mathematics & Computer Science

Rima Hazra, Mayank Singh, Pawan Goyal et al.

Interdisciplinarity has over the recent years have gained tremendous importance and has become one of the key ways of doing cutting edge research. In this paper we attempt to model the citation flow across three different fields -- Physics (PHY), Mathematics (MA) and Computer Science (CS). For instance, is there a specific pattern in which these fields cite one another? We carry out experiments on a dataset comprising more than 1.2 million articles taken from these three fields. We quantify the citation interactions among these three fields through temporal bucket signatures. We present numerical models based on variants of the recently proposed relay-linking framework to explain the citation dynamics across the three disciplines. These models make a modest attempt to unfold the underlying principles of how citation links could have been formed across the three fields over time.

CLJul 20, 2023
A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Anand Kumar Rai, Siddharth D Jaiswal, Animesh Mukherjee

Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services. However, it has been observed that state-of-the-art ASR systems which deliver impressive benchmark results, struggle with speakers of certain regions or demographics due to variation in their speech properties. In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of $\sim9.8$K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography. The dataset is sourced from the very popular NPTEL MOOC platform. We use the curated dataset to measure the existing disparity in YouTube Automatic Captions and OpenAI Whisper model performance across the diverse demographic traits of speakers in India. While there exists disparity due to gender, native region, age and speech rate of speakers, disparity based on caste is non-existent. We also observe statistically significant disparity across the disciplines of the lectures. These results indicate the need of more inclusive and robust ASR systems and more representational datasets for disparity evaluation in them.

CYSep 7, 2022
Decoding Demographic un-fairness from Indian Names

Medidoddi Vahini, Jalend Bantupalli, Souvic Chakraborty et al.

Demographic classification is essential in fairness assessment in recommender systems or in measuring unintended bias in online networks and voting systems. Important fields like education and politics, which often lay a foundation for the future of equality in society, need scrutiny to design policies that can better foster equality in resource distribution constrained by the unbalanced demographic distribution of people in the country. We collect three publicly available datasets to train state-of-the-art classifiers in the domain of gender and caste classification. We train the models in the Indian context, where the same name can have different styling conventions (Jolly Abraham/Kumar Abhishikta in one state may be written as Abraham Jolly/Abishikta Kumar in the other). Finally, we also perform cross-testing (training and testing on different datasets) to understand the efficacy of the above models. We also perform an error analysis of the prediction models. Finally, we attempt to assess the bias in the existing Indian system as case studies and find some intriguing patterns manifesting in the complex demographic layout of the sub-continent across the dimensions of gender and caste.

87.1CLMay 19
Lost in Interpretation: The Plausibility-Faithfulness Trade-off in Cross-Lingual Explanations

Somnath Banerjee, Pranav Jha, Rima Hazra et al.

LLMs deployed multilingually are often audited via English explanations for non-English inputs. We evaluate extractive explanations ''where the model identifies input token spans as evidence alongside a generated rationale'' and uncover a systematic trade-off: English-pivot explanations can achieve higher span agreement with human rationales while their evidence becomes less causally grounded in the model's prediction, as measured by both comprehensiveness and sufficiency. Across 3 tasks, 5~languages, and 2~multilingual LLM families, we find that English explanations frequently produce fluent but loosely anchored rationales, with comprehensiveness degrading by up to 5.7x relative to native-language conditions - even as task accuracy remains stable across settings. For socially nuanced classification, English pivots also fail to preserve pragmatic cues, reducing both faithfulness and span agreement. We recommend auditing explanations in the input language, reporting multi-faceted faithfulness metrics beyond lexical overlap, and treating English rationales as communication summaries rather than faithful decision traces.

CLJun 28, 2022
Placing (Historical) Facts on a Timeline: A Classification cum Coref Resolution Approach

Sayantan Adak, Altaf Ahmad, Aditya Basu et al.

A timeline provides one of the most effective ways to visualize the important historical facts that occurred over a period of time, presenting the insights that may not be so apparent from reading the equivalent information in textual form. By leveraging generative adversarial learning for important sentence classification and by assimilating knowledge based tags for improving the performance of event coreference resolution we introduce a two staged system for event timeline generation from multiple (historical) text documents. We demonstrate our results on two manually annotated historical text documents. Our results can be extremely helpful for historians, in advancing research in history and in understanding the socio-political landscape of a country as reflected in the writings of famous personas.

CLFeb 20, 2024Code
TEXT2AFFORD: Probing Object Affordance Prediction abilities of Language Models solely from Text

Sayantan Adak, Daivik Agrawal, Animesh Mukherjee et al.

We investigate the knowledge of object affordances in pre-trained language models (LMs) and pre-trained Vision-Language models (VLMs). A growing body of literature shows that PTLMs fail inconsistently and non-intuitively, demonstrating a lack of reasoning and grounding. To take a first step toward quantifying the effect of grounding (or lack thereof), we curate a novel and comprehensive dataset of object affordances -- Text2Afford, characterized by 15 affordance classes. Unlike affordance datasets collected in vision and language domains, we annotate in-the-wild sentences with objects and affordances. Experimental results reveal that PTLMs exhibit limited reasoning abilities when it comes to uncommon object affordances. We also observe that pre-trained VLMs do not necessarily capture object affordances effectively. Through few-shot fine-tuning, we demonstrate improvement in affordance knowledge in PTLMs and VLMs. Our research contributes a novel dataset for language grounding tasks, and presents insights into LM capabilities, advancing the understanding of object affordances. Codes and data are available at https://github.com/sayantan11995/Text2Afford

CLFeb 17, 2025Code
REVERSUM: A Multi-staged Retrieval-Augmented Generation Method to Enhance Wikipedia Tail Biographies through Personal Narratives

Sayantan Adak, Pauras Mangesh Meher, Paramita Das et al.

Wikipedia is an invaluable resource for factual information about a wide range of entities. However, the quality of articles on less-known entities often lags behind that of the well-known ones. This study proposes a novel approach to enhancing Wikipedia's B and C category biography articles by leveraging personal narratives such as autobiographies and biographies. By utilizing a multi-staged retrieval-augmented generation technique -- REVerSum -- we aim to enrich the informational content of these lesser-known articles. Our study reveals that personal narratives can significantly improve the quality of Wikipedia articles, providing a rich source of reliable information that has been underutilized in previous studies. Based on crowd-based evaluation, REVerSum generated content outperforms the best performing baseline by 17% in terms of integrability to the original Wikipedia article and 28.5\% in terms of informativeness. Code and Data are available at: https://github.com/sayantan11995/wikipedia_enrichment

CLFeb 16, 2025Code
Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment

Somnath Banerjee, Sayan Layek, Pratyush Chatterjee et al.

Ensuring consistent safety across multiple languages remains a significant challenge for large language models (LLMs). We introduce Soteria, a lightweight yet powerful strategy that locates and minimally adjusts the "functional heads" most responsible for harmful content generation in each language. By altering only a fraction of parameters, Soteria drastically reduces policy violations without sacrificing overall model performance, even in low-resource settings. To rigorously evaluate our approach, we also present XThreatBench, a specialized multilingual dataset capturing fine-grained harmful behaviors drawn from real policy guidelines. Experiments with leading open-source LLMs (e.g., Llama, Qwen, Mistral) show that Soteria consistently improves safety metrics across high-, mid-, and low-resource languages. These findings highlight a promising path toward scalable, linguistically attuned, and ethically aligned LLMs worldwide.

CLFeb 25, 2024Code
DistALANER: Distantly Supervised Active Learning Augmented Named Entity Recognition in the Open Source Software Ecosystem

Somnath Banerjee, Avik Dutta, Aaditya Agrawal et al.

With the AI revolution in place, the trend for building automated systems to support professionals in different domains such as the open source software systems, healthcare systems, banking systems, transportation systems and many others have become increasingly prominent. A crucial requirement in the automation of support tools for such systems is the early identification of named entities, which serves as a foundation for developing specialized functionalities. However, due to the specific nature of each domain, different technical terminologies and specialized languages, expert annotation of available data becomes expensive and challenging. In light of these challenges, this paper proposes a novel named entity recognition (NER) technique specifically tailored for the open-source software systems. Our approach aims to address the scarcity of annotated software data by employing a comprehensive two-step distantly supervised annotation process. This process strategically leverages language heuristics, unique lookup tables, external knowledge sources, and an active learning approach. By harnessing these powerful techniques, we not only enhance model performance but also effectively mitigate the limitations associated with cost and the scarcity of expert annotators. It is noteworthy that our model significantly outperforms the state-of-the-art LLMs by a substantial margin. We also show the effectiveness of NER in the downstream task of relation extraction.

CYOct 9, 2023
Auditing Gender Analyzers on Text Data

Siddharth D Jaiswal, Ankit Kumar Verma, Animesh Mukherjee

AI models have become extremely popular and accessible to the general public. However, they are continuously under the scanner due to their demonstrable biases toward various sections of the society like people of color and non-binary people. In this study, we audit three existing gender analyzers -- uClassify, Readable and HackerFactor, for biases against non-binary individuals. These tools are designed to predict only the cisgender binary labels, which leads to discrimination against non-binary members of the society. We curate two datasets -- Reddit comments (660k) and, Tumblr posts (2.05M) and our experimental evaluation shows that the tools are highly inaccurate with the overall accuracy being ~50% on all platforms. Predictions for non-binary comments on all platforms are mostly female, thus propagating the societal bias that non-binary individuals are effeminate. To address this, we fine-tune a BERT multi-label classifier on the two datasets in multiple combinations, observe an overall performance of ~77% on the most realistically deployable setting and a surprisingly higher performance of 90% for the non-binary class. We also audit ChatGPT using zero-shot prompts on a small dataset (due to high pricing) and observe an average accuracy of 58% for Reddit and Tumblr combined (with overall better results for Reddit). Thus, we show that existing systems, including highly advanced ones like ChatGPT are biased, and need better audits and moderation and, that such societal biases can be addressed and alleviated through simple off-the-shelf models like BERT trained on more gender inclusive datasets.

CLDec 13, 2024Code
Efficient Continual Pre-training of LLMs for Low-resource Languages

Arijit Nag, Soumen Chakrabarti, Animesh Mukherjee et al.

Open-source Large Language models (OsLLMs) propel the democratization of natural language research by giving the flexibility to augment or update model parameters for performance improvement. Nevertheless, like proprietary LLMs, Os-LLMs offer poorer performance on low-resource languages (LRLs) than high-resource languages (HRLs), owing to smaller amounts of training data and underrepresented vocabulary. On the other hand, continual pre-training (CPT) with large amounts of language-specific data is a costly proposition in terms of data acquisition and computational resources. Our goal is to drastically reduce CPT cost. To that end, we first develop a new algorithm to select a subset of texts from a larger corpus. We show the effectiveness of our technique using very little CPT data. In search of further improvement, we design a new algorithm to select tokens to include in the LLM vocabulary. We experiment with the recent Llama-3 model and nine Indian languages with diverse scripts and extent of resource availability. For evaluation, we use IndicGenBench, a generation task benchmark dataset for Indic languages. We experiment with various CPT corpora and augmented vocabulary size and offer insights across language families.

SDOct 22, 2024Code
DENOASR: Debiasing ASRs through Selective Denoising

Anand Kumar Rai, Siddharth D Jaiswal, Shubham Prakash et al.

Automatic Speech Recognition (ASR) systems have been examined and shown to exhibit biases toward particular groups of individuals, influenced by factors such as demographic traits, accents, and speech styles. Noise can disproportionately impact speakers with certain accents, dialects, or speaking styles, leading to biased error rates. In this work, we introduce a novel framework DENOASR, which is a selective denoising technique to reduce the disparity in the word error rates between the two gender groups, male and female. We find that a combination of two popular speech denoising techniques, viz. DEMUCS and LE, can be effectively used to mitigate ASR disparity without compromising their overall performance. Experiments using two state-of-the-art open-source ASRs - OpenAI WHISPER and NVIDIA NEMO - on multiple benchmark datasets, including TIE, VOX-POPULI, TEDLIUM, and FLEURS, show that there is a promising reduction in the average word error rate gap across the two gender groups. For a given dataset, the denoising is selectively applied on speech samples having speech intelligibility below a certain threshold, estimated using a small validation sample, thus ameliorating the need for large-scale human-written ground-truth transcripts. Our findings suggest that selective denoising can be an elegant approach to mitigate biases in present-day ASR systems.

IRMay 5, 2025Code
Social Biases in Knowledge Representations of Wikidata separates Global North from Global South

Paramita Das, Sai Keerthana Karnam, Aditya Soni et al.

Knowledge Graphs have become increasingly popular due to their wide usage in various downstream applications, including information retrieval, chatbot development, language model construction, and many others. Link prediction (LP) is a crucial downstream task for knowledge graphs, as it helps to address the problem of the incompleteness of the knowledge graphs. However, previous research has shown that knowledge graphs, often created in a (semi) automatic manner, are not free from social biases. These biases can have harmful effects on downstream applications, especially by leading to unfair behavior toward minority groups. To understand this issue in detail, we develop a framework -- AuditLP -- deploying fairness metrics to identify biased outcomes in LP, specifically how occupations are classified as either male or female-dominated based on gender as a sensitive attribute. We have experimented with the sensitive attribute of age and observed that occupations are categorized as young-biased, old-biased, and age-neutral. We conduct our experiments on a large number of knowledge triples that belong to 21 different geographies extracted from the open-sourced knowledge graph, Wikidata. Our study shows that the variance in the biased outcomes across geographies neatly mirrors the socio-economic and cultural division of the world, resulting in a transparent partition of the Global North from the Global South.

IRFeb 16, 2025Code
MemeSense: An Adaptive In-Context Framework for Social Commonsense Driven Meme Moderation

Sayantan Adak, Somnath Banerjee, Rajarshi Mandal et al.

Online memes are a powerful yet challenging medium for content moderation, often masking harmful intent behind humor, irony, or cultural symbolism. Conventional moderation systems "especially those relying on explicit text" frequently fail to recognize such subtle or implicit harm. We introduce MemeSense, an adaptive framework designed to generate socially grounded interventions for harmful memes by combining visual and textual understanding with curated, semantically aligned examples enriched with commonsense cues. This enables the model to detect nuanced complexed threats like misogyny, stereotyping, or vulgarity "even in memes lacking overt language". Across multiple benchmark datasets, MemeSense outperforms state-of-the-art methods, achieving up to 35% higher semantic similarity and 9% improvement in BERTScore for non-textual memes, and notable gains for text-rich memes as well. These results highlight MemeSense as a promising step toward safer, more context-aware AI systems for real-world content moderation. Code and data available at: https://github.com/sayantan11995/MemeSense

CVFeb 21, 2024Code
Mask-up: Investigating Biases in Face Re-identification for Masked Faces

Siddharth D Jaiswal, Ankit Kr. Verma, Animesh Mukherjee

AI based Face Recognition Systems (FRSs) are now widely distributed and deployed as MLaaS solutions all over the world, moreso since the COVID-19 pandemic for tasks ranging from validating individuals' faces while buying SIM cards to surveillance of citizens. Extensive biases have been reported against marginalized groups in these systems and have led to highly discriminatory outcomes. The post-pandemic world has normalized wearing face masks but FRSs have not kept up with the changing times. As a result, these systems are susceptible to mask based face occlusion. In this study, we audit four commercial and nine open-source FRSs for the task of face re-identification between different varieties of masked and unmasked images across five benchmark datasets (total 14,722 images). These simulate a realistic validation/surveillance task as deployed in all major countries around the world. Three of the commercial and five of the open-source FRSs are highly inaccurate; they further perpetuate biases against non-White individuals, with the lowest accuracy being 0%. A survey for the same task with 85 human participants also results in a low accuracy of 40%. Thus a human-in-the-loop moderation in the pipeline does not alleviate the concerns, as has been frequently hypothesized in literature. Our large-scale study shows that developers, lawmakers and users of such services need to rethink the design principles behind FRSs, especially for the task of face re-identification, taking cognizance of observed biases.

CLDec 18, 2020Code
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

Binny Mathew, Punyajoy Saha, Seid Muhie Yimam et al.

Hate speech is a challenging issue plaguing the online social media. While better models for hate speech detection are continuously being developed, there is little research on the bias and interpretability aspects of hate speech. In this paper, we introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue. Each post in our dataset is annotated from three different perspectives: the basic, commonly used 3-class classification (i.e., hate, offensive or normal), the target community (i.e., the community that has been the victim of hate speech/offensive speech in the post), and the rationales, i.e., the portions of the post on which their labelling decision (as hate, offensive or normal) is based. We utilize existing state-of-the-art models and observe that even models that perform very well in classification do not score high on explainability metrics like model plausibility and faithfulness. We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities. We have made our code and dataset public at https://github.com/punyajoy/HateXplain

SIApr 14, 2020Code
Deep Learning Models for Multilingual Hate Speech Detection

Sai Saketh Aluru, Binny Mathew, Punyajoy Saha et al.

Hate speech detection is a challenging problem with most of the datasets available in only one language: English. In this paper, we conduct a large scale analysis of multilingual hate speech in 9 languages from 16 different sources. We observe that in low resource setting, simple models such as LASER embedding with logistic regression performs the best, while in high resource setting BERT based models perform better. In case of zero-shot classification, languages such as Italian and Portuguese achieve good results. Our proposed framework could be used as an efficient solution for low-resource languages. These models could also act as good baselines for future multilingual hate speech detection tasks. We have made our code and experimental settings public for other researchers at https://github.com/punyajoy/DE-LIMIT.

SISep 27, 2019Code
HateMonitors: Language Agnostic Abuse Detection in Social Media

Punyajoy Saha, Binny Mathew, Pawan Goyal et al.

Reducing hateful and offensive content in online social media pose a dual problem for the moderators. On the one hand, rigid censorship on social media cannot be imposed. On the other, the free flow of such content cannot be allowed. Hence, we require efficient abusive language detection system to detect such harmful content in social media. In this paper, we present our machine learning model, HateMonitor, developed for Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC), a shared task at FIRE 2019. We have used a Gradient Boosting model, along with BERT and LASER embeddings, to make the system language agnostic. Our model came at First position for the German sub-task A. We have also made our model public at https://github.com/punyajoy/HateMonitors-HASOC .

SIDec 17, 2018Code
Hateminers : Detecting Hate speech against Women

Punyajoy Saha, Binny Mathew, Pawan Goyal et al.

With the online proliferation of hate speech, there is an urgent need for systems that can detect such harmful content. In this paper, We present the machine learning models developed for the Automatic Misogyny Identification (AMI) shared task at EVALITA 2018. We generate three types of features: Sentence Embeddings, TF-IDF Vectors, and BOW Vectors to represent each tweet. These features are then concatenated and fed into the machine learning models. Our model came First for the English Subtask A and Fifth for the English Subtask B. We release our winning model for public use and it's available at https://github.com/punyajoy/Hateminers-EVALITA.

DLSep 21, 2016Code
OCR++: A Robust Framework For Information Extraction from Scholarly Articles

Mayank Singh, Barnopriyo Barua, Priyank Palod et al.

This paper proposes OCR++, an open-source framework designed for a variety of information extraction tasks from scholarly articles including metadata (title, author names, affiliation and e-mail), structure (section headings and body text, table and figure headings, URLs and footnotes) and bibliography (citation instances and references). We analyze a diverse set of scientific articles written in English language to understand generic writing patterns and formulate rules to develop this hybrid framework. Extensive evaluations show that the proposed framework outperforms the existing state-of-the-art tools with huge margin in structural information extraction along with improved performance in metadata and bibliography extraction tasks, both in terms of accuracy (around 50% improvement) and processing time (around 52% improvement). A user experience study conducted with the help of 30 researchers reveals that the researchers found this system to be very helpful. As an additional objective, we discuss two novel use cases including automatically extracting links to public datasets from the proceedings, which would further accelerate the advancement in digital libraries. The result of the framework can be exported as a whole into structured TEI-encoded documents. Our framework is accessible online at http://cnergres.iitkgp.ac.in/OCR++/home/.

CLJan 8
See, Explain, and Intervene: A Few-Shot Multimodal Agent Framework for Hateful Meme Moderation

Naquee Rizwan, Subhankar Swain, Paramananda Bhaskar et al.

In this work, we examine hateful memes from three complementary angles - how to detect them, how to explain their content and how to intervene them prior to being posted - by applying a range of strategies built on top of generative AI models. To the best of our knowledge, explanation and intervention have typically been studied separately from detection, which does not reflect real-world conditions. Further, since curating large annotated datasets for meme moderation is prohibitively expensive, we propose a novel framework that leverages task-specific generative multimodal agents and the few-shot adaptability of large multimodal models to cater to different types of memes. We believe this is the first work focused on generalizable hateful meme moderation under limited data conditions, and has strong potential for deployment in real-world production scenarios. Warning: Contains potentially toxic contents.

CLFeb 23, 2024
How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

Somnath Banerjee, Sayan Layek, Rima Hazra et al.

In this study, we tackle a growing concern around the safety and ethical use of large language models (LLMs). Despite their potential, these models can be tricked into producing harmful or unethical content through various sophisticated methods, including 'jailbreaking' techniques and targeted manipulation. Our work zeroes in on a specific issue: to what extent LLMs can be led astray by asking them to generate responses that are instruction-centric such as a pseudocode, a program or a software snippet as opposed to vanilla text. To investigate this question, we introduce TechHazardQA, a dataset containing complex queries which should be answered in both text and instruction-centric formats (e.g., pseudocodes), aimed at identifying triggers for unethical responses. We query a series of LLMs -- Llama-2-13b, Llama-2-7b, Mistral-V2 and Mistral 8X7B -- and ask them to generate both text and instruction-centric responses. For evaluation we report the harmfulness score metric as well as judgements from GPT-4 and humans. Overall, we observe that asking LLMs to produce instruction-centric responses enhances the unethical response generation by ~2-38% across the models. As an additional objective, we investigate the impact of model editing using the ROME technique, which further increases the propensity for generating undesirable content. In particular, asking edited LLMs to generate instruction-centric responses further increases the unethical response generation by ~3-16% across the different models.

CLMar 22, 2024
On Zero-Shot Counterspeech Generation by LLMs

Punyajoy Saha, Aalok Agrawal, Abhik Jana et al.

With the emergence of numerous Large Language Models (LLM), the usage of such models in various Natural Language Processing (NLP) applications is increasing extensively. Counterspeech generation is one such key task where efforts are made to develop generative models by fine-tuning LLMs with hatespeech - counterspeech pairs, but none of these attempts explores the intrinsic properties of large language models in zero-shot settings. In this work, we present a comprehensive analysis of the performances of four LLMs namely GPT-2, DialoGPT, ChatGPT and FlanT5 in zero-shot settings for counterspeech generation, which is the first of its kind. For GPT-2 and DialoGPT, we further investigate the deviation in performance with respect to the sizes (small, medium, large) of the models. On the other hand, we propose three different prompting strategies for generating different types of counterspeech and analyse the impact of such strategies on the performance of the models. Our analysis shows that there is an improvement in generation quality for two datasets (17%), however the toxicity increase (25%) with increase in model size. Considering type of model, GPT-2 and FlanT5 models are significantly better in terms of counterspeech quality but also have high toxicity as compared to DialoGPT. ChatGPT are much better at generating counter speech than other models across all metrics. In terms of prompting, we find that our proposed strategies help in improving counter speech generation across all the models.

CLDec 16, 2024
Multilingual and Explainable Text Detoxification with Parallel Corpora

Daryna Dementieva, Nikolay Babakov, Amit Ronen et al.

Even with various regulations in place across countries and social media platforms (Government of India, 2021; European Parliament and Council of the European Union, 2022, digital abusive speech remains a significant issue. One potential approach to address this challenge is automatic text detoxification, a text style transfer (TST) approach that transforms toxic language into a more neutral or non-toxic form. To date, the availability of parallel corpora for the text detoxification task (Logachevavet al., 2022; Atwell et al., 2022; Dementievavet al., 2024a) has proven to be crucial for state-of-the-art approaches. With this work, we extend parallel text detoxification corpus to new languages -- German, Chinese, Arabic, Hindi, and Amharic -- testing in the extensive multilingual setup TST baselines. Next, we conduct the first of its kind an automated, explainable analysis of the descriptive features of both toxic and non-toxic sentences, diving deeply into the nuances, similarities, and differences of toxicity and detoxification across 9 languages. Finally, based on the obtained insights, we experiment with a novel text detoxification method inspired by the Chain-of-Thoughts reasoning approach, enhancing the prompting process through clustering on relevant descriptive attributes.

CLMar 8, 2024
Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs

Arijit Nag, Animesh Mukherjee, Niloy Ganguly et al.

Large Language Models (LLMs) exhibit impressive zero/few-shot inference and generation quality for high-resource languages (HRLs). A few of them have been trained on low-resource languages (LRLs) and give decent performance. Owing to the prohibitive costs of training LLMs, they are usually used as a network service, with the client charged by the count of input and output tokens. The number of tokens strongly depends on the script and language, as well as the LLM's subword vocabulary. We show that LRLs are at a pricing disadvantage, because the well-known LLMs produce more tokens for LRLs than HRLs. This is because most currently popular LLMs are optimized for HRL vocabularies. Our objective is to level the playing field: reduce the cost of processing LRLs in contemporary LLMs while ensuring that predictive and generative qualities are not compromised. As means to reduce the number of tokens processed by the LLM, we consider code-mixing, translation, and transliteration of LRLs to HRLs. We perform an extensive study using the IndicXTREME classification and six generative tasks dataset, covering 15 Indic and 3 other languages, while using GPT-4 (one of the costliest LLM services released so far) as a commercial LLM. We observe and analyze interesting patterns involving token count, cost, and quality across a multitude of languages and tasks. We show that choosing the best policy to interact with the LLM can reduce cost by 90% while giving better or comparable performance compared to communicating with the LLM in the original LRL.

CLFeb 19, 2024
Exploring the Limits of Zero Shot Vision Language Models for Hate Meme Detection: The Vulnerabilities and their Interpretations

Naquee Rizwan, Paramananda Bhaskar, Mithun Das et al.

There is a rapid increase in the use of multimedia content in current social media platforms. One of the highly popular forms of such multimedia content are memes. While memes have been primarily invented to promote funny and buoyant discussions, malevolent users exploit memes to target individuals or vulnerable communities, making it imperative to identify and address such instances of hateful memes. Thus social media platforms are in dire need for active moderation of such harmful content. While manual moderation is extremely difficult due to the scale of such content, automatic moderation is challenged by the need of good quality annotated data to train hate meme detection algorithms. This makes a perfect pretext for exploring the power of modern day vision language models (VLMs) that have exhibited outstanding performance across various tasks. In this paper we study the effectiveness of VLMs in handling intricate tasks such as hate meme detection in a completely zero-shot setting so that there is no dependency on annotated data for the task. We perform thorough prompt engineering and query state-of-the-art VLMs using various prompt types to detect hateful/harmful memes. We further interpret the misclassification cases using a novel superpixel based occlusion method. Finally we show that these misclassifications can be neatly arranged into a typology of error classes the knowledge of which should enable the design of better safety guardrails in future.

CLFeb 11, 2024
Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi

Mithun Das, Saurabh Kumar Pandey, Shivansh Sethi et al.

With the rise of online abuse, the NLP community has begun investigating the use of neural architectures to generate counterspeech that can "counter" the vicious tone of such abusive speech and dilute/ameliorate their rippling effect over the social network. However, most of the efforts so far have been primarily focused on English. To bridge the gap for low-resource languages such as Bengali and Hindi, we create a benchmark dataset of 5,062 abusive speech/counterspeech pairs, of which 2,460 pairs are in Bengali and 2,602 pairs are in Hindi. We implement several baseline models considering various interlingual transfer mechanisms with different configurations to generate suitable counterspeech to set up an effective benchmark. We observe that the monolingual setup yields the best performance. Further, using synthetic transfer, language models can generate counterspeech to some extent; specifically, we notice that transferability is better when languages belong to the same language family.

CLOct 15, 2024
Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models

Somnath Banerjee, Sayan Layek, Hari Shrawgi et al.

As LLMs are increasingly deployed in global applications, the importance of cultural sensitivity becomes paramount, ensuring that users from diverse backgrounds feel respected and understood. Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values. This work addresses the challenges of ensuring cultural sensitivity in LLMs, especially in small-parameter models that often lack the extensive training data needed to capture global cultural nuances. We present two key contributions: (1) A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and (2) A culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators. These datasets facilitate the evaluation and enhancement of LLMs, ensuring their ethical and safe deployment across different cultural landscapes. Our results show that integrating culturally aligned feedback leads to a marked improvement in model behavior, significantly reducing the likelihood of generating culturally insensitive or harmful content. Ultimately, this work paves the way for more inclusive and respectful AI systems, fostering a future where LLMs can safely and ethically navigate the complexities of diverse cultural landscapes.

CLJul 6, 2025
HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation

Naquee Rizwan, Seid Muhie Yimam, Daryna Dementieva et al.

Despite regulations imposed by nations and social media platforms, e.g. (Government of India, 2021; European Parliament and Council of the European Union, 2022), inter alia, hateful content persists as a significant challenge. Existing approaches primarily rely on reactive measures such as blocking or suspending offensive messages, with emerging strategies focusing on proactive measurements like detoxification and counterspeech. In our work, which we call HatePRISM, we conduct a comprehensive examination of hate speech regulations and strategies from three perspectives: country regulations, social platform policies, and NLP research datasets. Our findings reveal significant inconsistencies in hate speech definitions and moderation practices across jurisdictions and platforms, alongside a lack of alignment with research efforts. Based on these insights, we suggest ideas and research direction for further exploration of a unified framework for automated hate speech moderation incorporating diverse strategies.