CLAIMay 23, 2025

Taming LLMs with Negative Samples: A Reference-Free Framework to Evaluate Presentation Content with Actionable Feedback

arXiv:2505.18240v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses the need for automated evaluation of presentation slides, which is important for generative AI applications, though it is incremental as it builds on existing LLM-based evaluation techniques.

The paper tackles the problem of automatically evaluating multimodal content in presentation slides for effective summarization and concept conveyance, introducing a reference-free framework that outperforms existing methods in generating scores and actionable feedback.

The generation of presentation slides automatically is an important problem in the era of generative AI. This paper focuses on evaluating multimodal content in presentation slides that can effectively summarize a document and convey concepts to a broad audience. We introduce a benchmark dataset, RefSlides, consisting of human-made high-quality presentations that span various topics. Next, we propose a set of metrics to characterize different intrinsic properties of the content of a presentation and present REFLEX, an evaluation approach that generates scores and actionable feedback for these metrics. We achieve this by generating negative presentation samples with different degrees of metric-specific perturbations and use them to fine-tune LLMs. This reference-free evaluation technique does not require ground truth presentations during inference. Our extensive automated and human experiments demonstrate that our evaluation approach outperforms classical heuristic-based and state-of-the-art large language model-based evaluations in generating scores and explanations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes