CLAIDec 19, 2024

Northeastern Uni at Multilingual Counterspeech Generation: Enhancing Counter Speech Generation with LLM Alignment through Direct Preference Optimization

arXiv:2412.15453v119 citationsh-index: 6COLING Workshops
Originality Incremental advance
AI Analysis

This addresses the challenge of combating hate speech with automated, contextually appropriate responses across diverse linguistic settings, representing an incremental improvement in method.

The paper tackled the problem of generating high-quality and scalable counter-speech across multiple languages by aligning Large Language Models with Direct Preference Optimization, resulting in DPO-aligned models significantly outperforming SFT baselines on benchmarks while scaling effectively to languages like Basque, Italian, and Spanish.

The automatic generation of counter-speech (CS) is a critical strategy for addressing hate speech by providing constructive and informed responses. However, existing methods often fail to generate high-quality, impactful, and scalable CS, particularly across diverse linguistic contexts. In this paper, we propose a novel methodology to enhance CS generation by aligning Large Language Models (LLMs) using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Our approach leverages DPO to align LLM outputs with human preferences, ensuring contextually appropriate and linguistically adaptable responses. Additionally, we incorporate knowledge grounding to enhance the factual accuracy and relevance of generated CS. Experimental results demonstrate that DPO-aligned models significantly outperform SFT baselines on CS benchmarks while scaling effectively to multiple languages. These findings highlight the potential of preference-based alignment techniques to advance CS generation across varied linguistic settings. The model supervision and alignment is done in English and the same model is used for reporting metrics across other languages like Basque, Italian, and Spanish.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes