An Interpretable and Crosslingual Method for Evaluating Second-Language Dialogues
This work addresses the need for efficient and interpretable evaluation tools for second-language learners and educators, though it is incremental as it extends an existing framework to a new language.
The authors tackled the problem of evaluating second-language dialogues by analyzing the cross-lingual transferability of a framework linking linguistic features to interactivity labels, finding it robust across English and Chinese with language-specific and universal relationships. They proposed an automated, interpretable method that scores dialogue quality without labeled data, using a new Chinese dataset of 10K dialogues.
We analyse the cross-lingual transferability of a dialogue evaluation framework that assesses the relationships between micro-level linguistic features (e.g. backchannels) and macro-level interactivity labels (e.g. topic management), originally designed for English-as-a-second-language dialogues. To this end, we develop CNIMA (Chinese Non-Native Interactivity Measurement and Automation), a Chinese-as-a-second-language labelled dataset with 10K dialogues. We found the evaluation framework to be robust across distinct languages: English and Chinese, revealing language-specific and language-universal relationships between micro-level and macro-level features. Next, we propose an automated, interpretable approach with low data requirement that scores the overall quality of a second-language dialogue based on the framework. Our approach is interpretable in that it reveals the key linguistic and interactivity features that contributed to the overall quality score. As our approach does not require labelled data, it can also be adapted to other languages for second-language dialogue evaluation.