CLNov 14, 2020

Lessons from Computational Modelling of Reference Production in Mandarin and English

arXiv:2011.07398v231.0990 citations

Originality Synthesis-oriented

AI Analysis

This work addresses limitations in computational models of language production for linguists and NLP researchers, but it is incremental as it builds on existing algorithms and datasets.

The study evaluated classic referring expression generation algorithms on a Mandarin corpus and compared results with English, finding a much higher proportion of under-specified expressions than previously suggested in both languages.

Referring expression generation (REG) algorithms offer computational models of the production of referring expressions. In earlier work, a corpus of referring expressions (REs) in Mandarin was introduced. In the present paper, we annotate this corpus, evaluate classic REG algorithms on it, and compare the results with earlier results on the evaluation of REG for English referring expressions. Next, we offer an in-depth analysis of the corpus, focusing on issues that arise from the grammar of Mandarin. We discuss shortcomings of previous REG evaluations that came to light during our investigation and we highlight some surprising results. Perhaps most strikingly, we found a much higher proportion of under-specified expressions than previous studies had suggested, not just in Mandarin but in English as well.

View on arXiv PDF

Similar