CLFeb 27, 2025

HuAMR: A Hungarian AMR Parser and Dataset

Botond Barta, Endre Hamerlik, Milán Konor Nyist, Judit Ács

arXiv:2502.20552v1h-index: 9

Originality Synthesis-oriented

AI Analysis

This addresses the problem of limited semantic parsing resources for non-English languages like Hungarian, though it is incremental as it adapts existing methods to a new language.

The authors tackled the scarcity of semantic resources for Hungarian by creating HuAMR, the first Abstract Meaning Representation dataset for the language, and developed LLM-based parsers that effectively enhance parsing accuracy on Hungarian news data.

We present HuAMR, the first Abstract Meaning Representation (AMR) dataset and a suite of large language model-based AMR parsers for Hungarian, targeting the scarcity of semantic resources for non-English languages. To create HuAMR, we employed Llama-3.1-70B to automatically generate silver-standard AMR annotations, which we then refined manually to ensure quality. Building on this dataset, we investigate how different model architectures - mT5 Large and Llama-3.2-1B - and fine-tuning strategies affect AMR parsing performance. While incorporating silver-standard AMRs from Llama-3.1-70B into the training data of smaller models does not consistently boost overall scores, our results show that these techniques effectively enhance parsing accuracy on Hungarian news data (the domain of HuAMR). We evaluate our parsers using Smatch scores and confirm the potential of HuAMR and our parsers for advancing semantic parsing research.

View on arXiv PDF

Similar