CLAIDec 8, 2023

FREDSum: A Dialogue Summarization Corpus for French Political Debates

arXiv:2312.04843v1133 citationsh-index: 58EMNLP
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited multilingual datasets for dialogue summarization researchers, though it is incremental as it extends existing methods to a new language and domain.

The authors tackled the lack of resources for multi-party dialogue summarization in non-English languages by creating FREDSum, a dataset of French political debates with manual transcriptions and annotations, and provided baseline experiments using state-of-the-art methods.

Recent advances in deep learning, and especially the invention of encoder-decoder architectures, has significantly improved the performance of abstractive summarization systems. The majority of research has focused on written documents, however, neglecting the problem of multi-party dialogue summarization. In this paper, we present a dataset of French political debates for the purpose of enhancing resources for multi-lingual dialogue summarization. Our dataset consists of manually transcribed and annotated political debates, covering a range of topics and perspectives. We highlight the importance of high quality transcription and annotations for training accurate and effective dialogue summarization models, and emphasize the need for multilingual resources to support dialogue summarization in non-English languages. We also provide baseline experiments using state-of-the-art methods, and encourage further research in this area to advance the field of dialogue summarization. Our dataset will be made publicly available for use by the research community.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes