CLApr 7

THIVLVC: Retrieval Augmented Dependency Parsing for Latin

Luc Pommeret, Thibault Wagret, Jules Deret

arXiv:2604.0556430.9

Predicted impact top 74% in CL · last 90 daysOriginality Synthesis-oriented

AI Analysis

This addresses dependency parsing for Latin, an incremental advance with domain-specific impact on computational linguistics for historical languages.

The authors tackled Latin dependency parsing by developing THIVLVC, a two-stage system that retrieves structurally similar examples from a treebank and uses a large language model to refine parses. Results show improvements of +17 CLAS points on poetry and +1.5 CLAS on prose over the UDPipe baseline.

We describe THIVLVC, a two-stage system for the EvaLatin 2026 Dependency Parsing task. Given a Latin sentence, we retrieve structurally similar entries from the CIRCSE treebank using sentence length and POS n-gram similarity, then prompt a large language model to refine the baseline parse from UDPipe using the retrieved examples and UD annotation guidelines. We submit two configurations: one without retrieval and one with retrieval (RAG). On poetry (Seneca), THIVLVC improves CLAS by +17 points over the UDPipe baseline; on prose (Thomas Aquinas), the gain is +1.5 CLAS. A double-blind error analysis of 300 divergences between our system and the gold standard reveals that, among unanimous annotator decisions, 53.3% favour THIVLVC, showing annotation inconsistencies both within and across treebanks.

View on arXiv PDF

Similar