CLSep 29, 2024

CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays

arXiv:2409.19691v124 citationsh-index: 12
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited datasets for Chinese rhetoric analysis and generation, benefiting researchers and writers, though it is incremental as it builds on existing work by adding interrelations and more categories.

The authors tackled the lack of a comprehensive dataset for Chinese rhetorical understanding and generation by creating CERD, a manually annotated dataset with 4 coarse-grained and 23 fine-grained categories across five interrelated sub-tasks, and experiments showed that Large Language Models achieved the best performance across most tasks, with joint fine-tuning further enhancing it.

Existing rhetorical understanding and generation datasets or corpora primarily focus on single coarse-grained categories or fine-grained categories, neglecting the common interrelations between different rhetorical devices by treating them as independent sub-tasks. In this paper, we propose the Chinese Essay Rhetoric Dataset (CERD), consisting of 4 commonly used coarse-grained categories including metaphor, personification, hyperbole and parallelism and 23 fine-grained categories across both form and content levels. CERD is a manually annotated and comprehensive Chinese rhetoric dataset with five interrelated sub-tasks. Unlike previous work, our dataset aids in understanding various rhetorical devices, recognizing corresponding rhetorical components, and generating rhetorical sentences under given conditions, thereby improving the author's writing proficiency and language usage skills. Extensive experiments are conducted to demonstrate the interrelations between multiple tasks in CERD, as well as to establish a benchmark for future research on rhetoric. The experimental results indicate that Large Language Models achieve the best performance across most tasks, and jointly fine-tuning with multiple tasks further enhances performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes