Language Models as an Alternative Evaluator of Word Order Hypotheses: A Case Study in Japanese
This work addresses the challenge of word order analysis for linguists, particularly in complex languages like Japanese, though it is incremental as it applies existing LMs to a new linguistic task.
The study tackled the problem of analyzing word order in languages by proposing a neural language model-based method as an alternative to existing approaches, and found that LMs show sufficient word order knowledge to serve as a valid analysis tool, with results consistent with human preferences and previous linguistic studies.
We examine a methodology using neural language models (LMs) for analyzing the word order of language. This LM-based method has the potential to overcome the difficulties existing methods face, such as the propagation of preprocessor errors in count-based methods. In this study, we explore whether the LM-based method is valid for analyzing the word order. As a case study, this study focuses on Japanese due to its complex and flexible word order. To validate the LM-based method, we test (i) parallels between LMs and human word order preference, and (ii) consistency of the results obtained using the LM-based method with previous linguistic studies. Through our experiments, we tentatively conclude that LMs display sufficient word order knowledge for usage as an analysis tool. Finally, using the LM-based method, we demonstrate the relationship between the canonical word order and topicalization, which had yet to be analyzed by large-scale experiments.