When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology
For researchers in computational morphology, this work highlights the need for finer-grained subclass analysis beyond standard conjugation categories to identify and address systematic errors in neural models.
The paper shows that in Japanese past-tense verb inflection, a tiny irregular subtype (<1% of data) causes a disproportionate share of errors, and removing it improves generalization more than removing all irregular verbs. This reveals that error concentration stems from low-frequency patterns interacting with specific morphophonological processes like gemination.
Neural morphological generation systems often achieve high aggregate accuracy on benchmark datasets, yet such performance can conceal systematic errors concentrated in rare morphological subclasses. We examine Japanese past-tense verb inflection and show that a very small, structurally specific irregular subtype (<1% of data) accounts for a disproportionate share of model errors. Controlled ablation experiments demonstrate that removing this subtype yields larger improvements in generalization than removing all irregular verbs, indicating that not all irregularity contributes equally to model instability. These findings suggest that error concentration is driven by the interaction between extreme low-frequency morphological patterns and specific morphophonological processes, particularly gemination. We argue that morphological evaluation should incorporate finer-grained subclass analysis beyond standard conjugation categories.