CLFeb 15

CCiV: A Benchmark for Structure, Rhythm and Quality in LLM-Generated Chinese \textit{Ci} Poetry

arXiv:2602.14081v1
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of systematically evaluating and improving LLM capabilities in generating culturally rich and formally constrained poetry, though it is incremental as it focuses on benchmarking rather than novel generation methods.

The authors tackled the challenge of evaluating LLM-generated classical Chinese Ci poetry by introducing the CCiV benchmark, which assesses structure, rhythm, and quality, revealing that models often produce valid historical variants and struggle more with tonal patterns than structural rules.

The generation of classical Chinese \textit{Ci} poetry, a form demanding a sophisticated blend of structural rigidity, rhythmic harmony, and artistic quality, poses a significant challenge for large language models (LLMs). To systematically evaluate and advance this capability, we introduce \textbf{C}hinese \textbf{Ci}pai \textbf{V}ariants (\textbf{CCiV}), a benchmark designed to assess LLM-generated \textit{Ci} poetry across these three dimensions: structure, rhythm, and quality. Our evaluation of 17 LLMs on 30 \textit{Cipai} reveals two critical phenomena: models frequently generate valid but unexpected historical variants of a poetic form, and adherence to tonal patterns is substantially harder than structural rules. We further show that form-aware prompting can improve structural and tonal control for stronger models, while potentially degrading weaker ones. Finally, we observe weak and inconsistent alignment between formal correctness and literary quality in our sample. CCiV highlights the need for variant-aware evaluation and more holistic constrained creative generation methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes