CLJan 4, 2024

DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models

Cambridge
arXiv:2401.02208v130 citationsh-index: 10Has CodeNAACL
Originality Synthesis-oriented
AI Analysis

This toolkit addresses the problem of high entry barriers for researchers developing and evaluating multilingual task-oriented dialogue systems, though it is incremental as it builds on existing methods for evaluation and comparison.

The researchers tackled the development and evaluation of multilingual task-oriented dialogue systems by introducing DIALIGHT, a toolkit that enables systematic comparisons between fine-tuned pretrained language models and large language models using zero-shot and in-context learning, finding that fine-tuned models achieve higher accuracy and coherence while LLMs produce more diverse and likeable responses but struggle with task adherence and multilingual output.

We present DIALIGHT, a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems which facilitates systematic evaluations and comparisons between ToD systems using fine-tuning of Pretrained Language Models (PLMs) and those utilising the zero-shot and in-context learning capabilities of Large Language Models (LLMs). In addition to automatic evaluation, this toolkit features (i) a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level, and (ii) a microservice-based backend, improving efficiency and scalability. Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses. However, we also identify significant challenges of LLMs in adherence to task-specific instructions and generating outputs in multiple languages, highlighting areas for future research. We hope this open-sourced toolkit will serve as a valuable resource for researchers aiming to develop and properly evaluate multilingual ToD systems and will lower, currently still high, entry barriers in the field.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes