Automated Journalistic Questions: A New Method for Extracting 5W1H in French
This addresses the need for systematic event description in journalism and related NLP tasks like summarization, but it is incremental as it applies existing concepts to a new language and dataset.
The paper tackled the problem of automatically extracting 5W1H information from French news articles by designing the first automated pipeline for this task, achieving performance comparable to GPT-4o on a newly created corpus of 250 Quebec news articles.
The 5W1H questions -- who, what, when, where, why and how -- are commonly used in journalism to ensure that an article describes events clearly and systematically. Answering them is a crucial prerequisites for tasks such as summarization, clustering, and news aggregation. In this paper, we design the first automated extraction pipeline to get 5W1H information from French news articles. To evaluate the performance of our algorithm, we also create a corpus of 250 Quebec news articles with 5W1H answers marked by four human annotators. Our results demonstrate that our pipeline performs as well in this task as the large language model GPT-4o.