CLJul 11, 2024

LLMs' morphological analyses of complex FST-generated Finnish words

arXiv:2407.08269v127 citationsh-index: 35
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of assessing whether neural NLP systems learn human-like grammar rules, specifically for linguists and AI researchers, but it is incremental as it applies existing methods to a new dataset.

The study evaluated state-of-the-art LLMs on morphological analysis of complex Finnish noun forms generated by an FST tool, finding that GPT-4-turbo had some difficulties, GPT-3.5-turbo struggled, and smaller models like Llama2-70B and Poro-34B failed nearly completely.

Rule-based language processing systems have been overshadowed by neural systems in terms of utility, but it remains unclear whether neural NLP systems, in practice, learn the grammar rules that humans use. This work aims to shed light on the issue by evaluating state-of-the-art LLMs in a task of morphological analysis of complex Finnish noun forms. We generate the forms using an FST tool, and they are unlikely to have occurred in the training sets of the LLMs, therefore requiring morphological generalisation capacity. We find that GPT-4-turbo has some difficulties in the task while GPT-3.5-turbo struggles and smaller models Llama2-70B and Poro-34B fail nearly completely.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes