CL LGJul 31, 2024

Synth-Empathy: Towards High-Quality Synthetic Empathy Data

Hao Liang, Linzhuang Sun, Jingxuan Wei, Xijie Huang, Linkun Sun, Bihui Yu, Conghui He, Wentao Zhang

arXiv:2407.21669v26.612 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of generating high-quality empathetic data for LLMs, which is incremental as it builds on existing methods for data synthesis and selection.

The paper tackles the problem of insufficient and costly human-labeled empathetic data by introducing Synth-Empathy, an LLM-based pipeline that automatically generates and selects high-quality synthetic empathetic data, achieving state-of-the-art results on multiple benchmarks and improving empathetic response performance.

In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capabilities has become a crucial prerequisite. Consequently, managing and understanding empathetic datasets have gained increasing significance. However, empathetic data are typically human-labeled, leading to insufficient datasets and wasted human labor. In this work, we present Synth-Empathy, an LLM-based data generation and quality and diversity selection pipeline that automatically generates high-quality empathetic data while discarding low-quality data. With the data generated from a low empathetic model, we are able to further improve empathetic response performance and achieve state-of-the-art (SoTA) results across multiple benchmarks. Moreover, our model achieves SoTA performance on various human evaluation benchmarks, demonstrating its effectiveness and robustness in real-world applications. Furthermore, we show the trade-off between data quantity and quality, providing insights into empathetic data generation and selection.

View on arXiv PDF Code

Similar