CLJan 4, 2024

DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models

Songbo Hu, Xiaobin Wang, Zhangdie Yuan, Anna Korhonen, Ivan Vulić

Cambridge

arXiv:2401.02208v115.430 citationsh-index: 10Has CodeNAACL

Originality Synthesis-oriented

AI Analysis

This toolkit addresses the problem of high entry barriers for researchers developing and evaluating multilingual task-oriented dialogue systems, though it is incremental as it builds on existing methods for evaluation and comparison.

The researchers tackled the development and evaluation of multilingual task-oriented dialogue systems by introducing DIALIGHT, a toolkit that enables systematic comparisons between fine-tuned pretrained language models and large language models using zero-shot and in-context learning, finding that fine-tuned models achieve higher accuracy and coherence while LLMs produce more diverse and likeable responses but struggle with task adherence and multilingual output.

We present DIALIGHT, a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems which facilitates systematic evaluations and comparisons between ToD systems using fine-tuning of Pretrained Language Models (PLMs) and those utilising the zero-shot and in-context learning capabilities of Large Language Models (LLMs). In addition to automatic evaluation, this toolkit features (i) a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level, and (ii) a microservice-based backend, improving efficiency and scalability. Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses. However, we also identify significant challenges of LLMs in adherence to task-specific instructions and generating outputs in multiple languages, highlighting areas for future research. We hope this open-sourced toolkit will serve as a valuable resource for researchers aiming to develop and properly evaluate multilingual ToD systems and will lower, currently still high, entry barriers in the field.

View on arXiv PDF Code

Similar