CLJun 18, 2024

GPT Czech Poet: Generation of Czech Poetic Strophes with Language Models

arXiv:2407.12790v14 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for high-quality poetry generation tools in Czech, representing an incremental advancement in domain-specific natural language processing.

The authors tackled the problem of automated poetry generation for Czech, a language with limited existing systems, by fine-tuning a pre-trained Large Language Model and introducing techniques like explicit strophe parameter guidance and forced generation, achieving high accuracies in rhyming and metric aspects.

High-quality automated poetry generation systems are currently only available for a small subset of languages. We introduce a new model for generating poetry in Czech language, based on fine-tuning a pre-trained Large Language Model. We demonstrate that guiding the generation process by explicitly specifying strophe parameters within the poem text strongly improves the effectiveness of the model. We also find that appropriate tokenization is crucial, showing that tokenization methods based on syllables or individual characters instead of subwords prove superior in generating poetic strophes. We further enhance the results by introducing \textit{Forced~generation}, adding explicit specifications of meter and verse parameters at inference time based on the already generated text. We evaluate a range of setups, showing that our proposed approach achieves high accuracies in rhyming and metric aspects of formal quality of the generated poems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes