CLNov 29, 2023

CESAR: Automatic Induction of Compositional Instructions for Multi-turn Dialogs

arXiv:2311.17376v1132 citationsh-index: 61
Originality Incremental advance
AI Analysis

This work addresses the problem of enhancing instruction-following capabilities in LLMs for dialog applications, representing an incremental advancement through automated instruction induction.

The authors tackled the performance gap of publicly available LLMs in handling complex, multi-constraint instructions in multi-turn dialogs by proposing CESAR, a framework that automatically induces compositional instructions, resulting in the creation of InstructDial++ with 63 datasets and 154 tasks, and models trained on it showed improved ability to follow compositional prompts.

Instruction-based multitasking has played a critical role in the success of large language models (LLMs) in multi-turn dialog applications. While publicly available LLMs have shown promising performance, when exposed to complex instructions with multiple constraints, they lag against state-of-the-art models like ChatGPT. In this work, we hypothesize that the availability of large-scale complex demonstrations is crucial in bridging this gap. Focusing on dialog applications, we propose a novel framework, CESAR, that unifies a large number of dialog tasks in the same format and allows programmatic induction of complex instructions without any manual effort. We apply CESAR on InstructDial, a benchmark for instruction-based dialog tasks. We further enhance InstructDial with new datasets and tasks and utilize CESAR to induce complex tasks with compositional instructions. This results in a new benchmark called InstructDial++, which includes 63 datasets with 86 basic tasks and 68 composite tasks. Through rigorous experiments, we demonstrate the scalability of CESAR in providing rich instructions. Models trained on InstructDial++ can follow compositional prompts, such as prompts that ask for multiple stylistic constraints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes