CLJan 14

LLMs Got Rhythm? Hybrid Phonological Filtering for Greek Poetry Rhyme Detection and Generation

arXiv:2601.09631v1h-index: 19

Originality Incremental advance

AI Analysis

This addresses the specific problem of phonological limitations in LLMs for poetry analysis/generation in lower-resource languages like Greek, representing a domain-specific incremental improvement.

The paper tackles the problem of LLMs struggling with phonological tasks like rhyme detection and generation in Modern Greek poetry by developing a hybrid system combining LLMs with deterministic phonological algorithms. Results show that while pure LLM generation fails catastrophically (under 4% valid poems), the hybrid system with verification achieves 73.1% valid poem generation and identifies a 'Reasoning Gap' where reasoning-heavy models achieve 54% accuracy in rhyme identification with Chain-of-Thought prompting.

Large Language Models (LLMs), despite their remarkable capabilities across NLP tasks, struggle with phonologically-grounded phenomena like rhyme detection and generation. This is even more evident in lower-resource languages such as Modern Greek. In this paper, we present a hybrid system that combines LLMs with deterministic phonological algorithms to achieve accurate rhyme identification/analysis and generation. Our approach implements a comprehensive taxonomy of Greek rhyme types, including Pure, Rich, Imperfect, Mosaic, and Identical Pre-rhyme Vowel (IDV) patterns, and employs an agentic generation pipeline with phonological verification. We evaluate multiple prompting strategies (zero-shot, few-shot, Chain-of-Thought, and RAG-augmented) across several LLMs including Claude 3.7 and 4.5, GPT-4o, Gemini 2.0 and open-weight models like Llama 3.1 8B and 70B and Mistral Large. Results reveal a significant "Reasoning Gap": while native-like models (Claude 3.7) perform intuitively (40\% accuracy in identification), reasoning-heavy models (Claude 4.5) achieve state-of-the-art performance (54\%) only when prompted with Chain-of-Thought. Most critically, pure LLM generation fails catastrophically (under 4\% valid poems), while our hybrid verification loop restores performance to 73.1\%. We release our system and a crucial, rigorously cleaned corpus of 40,000+ rhymes, derived from the Anemoskala and Interwar Poetry corpora, to support future research.

View on arXiv PDF

Similar