CLAINov 7, 2024

Explaining Mixtures of Sources in News Articles

arXiv:2411.05192v123 citationsh-index: 4EMNLP
Originality Incremental advance
AI Analysis

This provides a framework for evaluating planning in long-form generation, which is incremental as it adapts existing schemata and introduces new ones for a specific domain (news journalism).

The study tackled the problem of understanding source-selection planning in news article generation by analyzing why specific stories require particular sources, and found that stance and social affiliation schemata best explain source plans in most documents, with textual entailment being more effective for factually rich topics like 'Science'.

Human writers plan, then write. For large language models (LLMs) to play a role in longer-form article generation, we must understand the planning steps humans make before writing. We explore one kind of planning, source-selection in news, as a case-study for evaluating plans in long-form generation. We ask: why do specific stories call for specific kinds of sources? We imagine a generative process for story writing where a source-selection schema is first selected by a journalist, and then sources are chosen based on categories in that schema. Learning the article's plan means predicting the schema initially chosen by the journalist. Working with professional journalists, we adapt five existing schemata and introduce three new ones to describe journalistic plans for the inclusion of sources in documents. Then, inspired by Bayesian latent-variable modeling, we develop metrics to select the most likely plan, or schema, underlying a story, which we use to compare schemata. We find that two schemata: stance and social affiliation best explain source plans in most documents. However, other schemata like textual entailment explain source plans in factually rich topics like "Science". Finally, we find we can predict the most suitable schema given just the article's headline with reasonable accuracy. We see this as an important case-study for human planning, and provides a framework and approach for evaluating other kinds of plans. We release a corpora, NewsSources, with annotations for 4M articles.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes