SEApr 15

ToxiShield: Promoting Inclusive Developer Communication through Real-Time Toxicity Filtering

MD Awsaf Alam Anindya, Showvik Biswas, Anindya Iqbal, Jaydeb Sarker, Amiangshu Bosu

arXiv:2604.1440858.3h-index: 21Has Code

Predicted impact top 39% in SE · last 90 daysOriginality Incremental advance

AI Analysis

For software engineering teams, ToxiShield provides a real-time tool to reduce toxic communication in code reviews, addressing a gap in existing research that focused only on detection.

ToxiShield is a browser extension for GitHub pull requests that detects toxic comments, provides explanations, and suggests constructive alternatives. It achieves 98% accuracy in toxicity detection, 95.27% style transfer accuracy, and positive user acceptance in a human evaluation.

Toxic interactions during code reviews can undermine teamwork and hinder productivity in software engineering (SE) teams. While prior studies explore toxicity detection and empirical investigation, they lack real-time detoxification tools to support the SE community. To address this gap, we present ToxiShield, a browser extension for GitHub pull requests that is built using three modules: i) Toxicity Filter -- to identify whether a text is toxic, ii) Communication coach -- to facilitate just-in-time fine-grained toxicity categorization with explanations, and iii) The Reframer -- that generates a revised, constructive alternative of a toxic text. For each module, we trained and evaluated multiple deep learning and Large Language Models (LLMs) to identify the best choice. A BERT-based binary detection model, trained on 38,761 code review samples, achieves 98% accuracy and an F1-score of 97% and is the selected one for the Toxicity Filter module. For the Communication Coach, prompt-tuned Claude 3.5 Sonnet achieved the best performance with 39% MCC and 42% F1 in multiclass toxicity classification with detailed reasoning. For Reframer, we evaluated five LLMs using a fine-tuning strategy on a dataset of 10,120 code review comments. The fine-tuned Llama 3.2 model achieves 95.27% style transfer accuracy, 97.03% fluency, 67.07% content preservation, and an 84% J-score. We further validated ToxiShield through a human evaluation using the Technology Acceptance Model with 10 participants, confirming its perceived usefulness and ease of adoption. ToxiShield sets a benchmark for advancing constructive communication in software engineering, driving inclusivity and healthier collaboration in open-source communities.

View on arXiv PDF

Similar