66.3LGMar 16
SolarGPT-QA: A Domain-Adaptive Large Language Model for Educational Question Answering in Space Weather and HeliophysicsSantosh Chapagain, MohammadReza EskandariNasab, Onur Vural et al.
Solar activity, including solar flares, coronal mass ejections (CMEs), and geomagnetic storms can significantly impact satellites, aviation, power grids, data centers, and space missions. Extreme solar events can cause substantial economic damage with limited advance warning, underscoring the importance of early warning systems, accurate forecasting, and effective education in space science. Although large language models (LLMs) perform well on general tasks, they often lack domain specific knowledge and pedagogical capability to clearly explain complex space science concepts. We introduce SolarGPT-QA, a question answering system based on a domain adapted large language model built on the LLaMA-3 base model. The model is trained using scientific literature and large scale question and answer data generated with GPT-4 and refined using Grok-3 in a student friendly storytelling style. To evaluate response quality, we employ an LLM-as-judge evaluation framework, where a strong reference model assesses generated answers using structured criteria including scientific accuracy, clarity, completeness, and pedagogical effectiveness. Results show that SolarGPT-QA performs strongly relative to general purpose models in zero shot settings and achieves competitive performance compared to instruction tuned models for educational explanations in space weather and heliophysics. Ablation studies indicate that combining domain adaptive pretraining with fine tuning is important for balancing scientific accuracy and educational effectiveness.
LGAug 27, 2025
Pruning Strategies for Backdoor Defense in LLMsSantosh Chapagain, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi
Backdoor attacks are a significant threat to the performance and integrity of pre-trained language models. Although such models are routinely fine-tuned for downstream NLP tasks, recent work shows they remain vulnerable to backdoor attacks that survive vanilla fine-tuning. These attacks are difficult to defend because end users typically lack knowledge of the attack triggers. Such attacks consist of stealthy malicious triggers introduced through subtle syntactic or stylistic manipulations, which can bypass traditional detection and remain in the model, making post-hoc purification essential. In this study, we explore whether attention-head pruning can mitigate these threats without any knowledge of the trigger or access to a clean reference model. To this end, we design and implement six pruning-based strategies: (i) gradient-based pruning, (ii) layer-wise variance pruning, (iii) gradient-based pruning with structured L1/L2 sparsification, (iv) randomized ensemble pruning, (v) reinforcement-learning-guided pruning, and (vi) Bayesian uncertainty pruning. Each method iteratively removes the least informative heads while monitoring validation accuracy to avoid over-pruning. Experimental evaluation shows that gradient-based pruning performs best while defending the syntactic triggers, whereas reinforcement learning and Bayesian pruning better withstand stylistic attacks.
CLSep 3, 2025
Advancing Minority Stress Detection with Transformers: Insights from the Social Media DatasetsSantosh Chapagain, Cory J Cascalheira, Shah Muhammad Hamdi et al.
Individuals from sexual and gender minority groups experience disproportionately high rates of poor health outcomes and mental disorders compared to their heterosexual and cisgender counterparts, largely as a consequence of minority stress as described by Meyer's (2003) model. This study presents the first comprehensive evaluation of transformer-based architectures for detecting minority stress in online discourse. We benchmark multiple transformer models including ELECTRA, BERT, RoBERTa, and BART against traditional machine learning baselines and graph-augmented variants. We further assess zero-shot and few-shot learning paradigms to assess their applicability on underrepresented datasets. Experiments are conducted on the two largest publicly available Reddit corpora for minority stress detection, comprising 12,645 and 5,789 posts, and are repeated over five random seeds to ensure robustness. Our results demonstrate that integrating graph structure consistently improves detection performance across transformer-only models and that supervised fine-tuning with relational context outperforms zero and few-shot approaches. Theoretical analysis reveals that modeling social connectivity and conversational context via graph augmentation sharpens the models' ability to identify key linguistic markers such as identity concealment, internalized stigma, and calls for support, suggesting that graph-enhanced transformers offer the most reliable foundation for digital health interventions and public health policy.
LGAug 6, 2025
Advancing Hate Speech Detection with Transformers: Insights from the MetaHateSantosh Chapagain, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi
Hate speech is a widespread and harmful form of online discourse, encompassing slurs and defamatory posts that can have serious social, psychological, and sometimes physical impacts on targeted individuals and communities. As social media platforms such as X (formerly Twitter), Facebook, Instagram, Reddit, and others continue to facilitate widespread communication, they also become breeding grounds for hate speech, which has increasingly been linked to real-world hate crimes. Addressing this issue requires the development of robust automated methods to detect hate speech in diverse social media environments. Deep learning approaches, such as vanilla recurrent neural networks (RNNs), long short-term memory (LSTM), and convolutional neural networks (CNNs), have achieved good results, but are often limited by issues such as long-term dependencies and inefficient parallelization. This study represents the comprehensive exploration of transformer-based models for hate speech detection using the MetaHate dataset--a meta-collection of 36 datasets with 1.2 million social media samples. We evaluate multiple state-of-the-art transformer models, including BERT, RoBERTa, GPT-2, and ELECTRA, with fine-tuned ELECTRA achieving the highest performance (F1 score: 0.8980). We also analyze classification errors, revealing challenges with sarcasm, coded language, and label noise.