Chain of Explanation: New Prompting Method to Generate Higher Quality Natural Language Explanation for Implicit Hate Speech
This addresses the problem of explaining implicit hate speech for researchers and practitioners, but it is incremental as it builds on existing prompting methods.
The paper tackled generating natural language explanations for implicit hate speech by proposing the Chain of Explanation prompting method, which improved the BLUE score from 44.0 to 62.3.
Recent studies have exploited advanced generative language models to generate Natural Language Explanations (NLE) for why a certain text could be hateful. We propose the Chain of Explanation (CoE) Prompting method, using the heuristic words and target group, to generate high-quality NLE for implicit hate speech. We improved the BLUE score from 44.0 to 62.3 for NLE generation by providing accurate target information. We then evaluate the quality of generated NLE using various automatic metrics and human annotations of informativeness and clarity scores.