CL AIMar 4, 2025

SAGE: Steering Dialog Generation with Future-Aware State-Action Augmentation

arXiv:2503.03040v22 citationsh-index: 50Has CodeProceedings of the The 4th Workshop on Perspectivist Approaches to NLP

AI Analysis

This work addresses the problem of creating more natural and strategic conversational agents for users, though it appears incremental as it builds on existing language model fine-tuning with novel augmentations.

The paper tackles the challenge of building emotionally intelligent chatbots by introducing SAGE, a method that uses latent variables to control long-horizon behavior in dialogue generation, resulting in improved performance in emotional intelligence metrics while maintaining strong capabilities on LLM benchmarks.

Recent advances in large language models have demonstrated impressive capabilities in task-oriented applications, yet building emotionally intelligent chatbots that can engage in natural, strategic conversations remains a challenge. We present a novel approach called SAGE that uses latent variables to control long-horizon behavior in dialogue generation. At the core of our method is the State-Action Chain (SAC), which augments standard language model fine-tuning by introducing latent variables that encapsulate emotional states and conversational strategies between dialogue turns. During inference, these variables are generated before each response, enabling coarse-grained control over dialogue progression while maintaining natural interaction patterns. We also introduce a self-improvement pipeline that leverages dialogue tree search, LLM-based reward modeling, and targeted fine-tuning to optimize conversational trajectories. Our experimental results show that models trained with this approach demonstrate improved performance in emotional intelligence metrics while maintaining strong capabilities on LLM benchmarks. The discrete nature of our latent variables facilitates search-based strategies and provides a foundation for future applications of reinforcement learning to dialogue systems, where learning can occur at the state level rather than the token level. https://github.com/apple/ml-sage-dialog-gen

View on arXiv PDF Code

Similar