CLFeb 18

CitiLink-Summ: Summarization of Discussion Subjects in European Portuguese Municipal Meeting Minutes

Miguel Marques, Ana Luísa Fernandes, Ana Filipa Pacheco, Rute Rebouças, Inês Cantante, José Isidro, Luís Filipe Cunha, Alípio Jorge, Nuno Guimarães, Sérgio Nunes, António Leal, Purificação Silvano

arXiv:2602.16607v10.6h-index: 5

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of making local government documents more accessible for citizens in a low-resource language, but it is incremental as it primarily introduces a dataset and benchmarks without major methodological breakthroughs.

The authors tackled the problem of summarizing discussion subjects in European Portuguese municipal meeting minutes, which are lengthy and complex, by creating CitiLink-Summ, a new corpus of 100 documents with 2,322 manually written summaries, and established baseline results using state-of-the-art models like BART and LLMs, achieving evaluation with metrics such as ROUGE and BERTScore.

Municipal meeting minutes are formal records documenting the discussions and decisions of local government, yet their content is often lengthy, dense, and difficult for citizens to navigate. Automatic summarization can help address this challenge by producing concise summaries for each discussion subject. Despite its potential, research on summarizing discussion subjects in municipal meeting minutes remains largely unexplored, especially in low-resource languages, where the inherent complexity of these documents adds further challenges. A major bottleneck is the scarcity of datasets containing high-quality, manually crafted summaries, which limits the development and evaluation of effective summarization models for this domain. In this paper, we present CitiLink-Summ, a new corpus of European Portuguese municipal meeting minutes, comprising 100 documents and 2,322 manually hand-written summaries, each corresponding to a distinct discussion subject. Leveraging this dataset, we establish baseline results for automatic summarization in this domain, employing state-of-the-art generative models (e.g., BART, PRIMERA) as well as large language models (LLMs), evaluated with both lexical and semantic metrics such as ROUGE, BLEU, METEOR, and BERTScore. CitiLink-Summ provides the first benchmark for municipal-domain summarization in European Portuguese, offering a valuable resource for advancing NLP research on complex administrative texts.

View on arXiv PDF

Similar