CLAILGMar 10, 2025

Fine-Tuning LLMs for Report Summarization: Analysis on Supervised and Unsupervised Data

arXiv:2503.10676v11 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses the practical challenge of deploying summarization models in resource-constrained, sensitive environments like government archives, though it is incremental in nature.

The study investigated whether fine-tuning large language models for report summarization is feasible with limited on-premise compute resources, finding that it often improves summary quality or reduces invalid outputs.

We study the efficacy of fine-tuning Large Language Models (LLMs) for the specific task of report (government archives, news, intelligence reports) summarization. While this topic is being very actively researched - our specific application set-up faces two challenges: (i) ground-truth summaries maybe unavailable (e.g., for government archives), and (ii) availability of limited compute power - the sensitive nature of the application requires that computation is performed on-premise and for most of our experiments we use one or two A100 GPU cards. Under this set-up we conduct experiments to answer the following questions. First, given that fine-tuning the LLMs can be resource intensive, is it feasible to fine-tune them for improved report summarization capabilities on-premise? Second, what are the metrics we could leverage to assess the quality of these summaries? We conduct experiments on two different fine-tuning approaches in parallel and our findings reveal interesting trends regarding the utility of fine-tuning LLMs. Specifically, we find that in many cases, fine-tuning helps improve summary quality and in other cases it helps by reducing the number of invalid or garbage summaries.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes