CLAug 9, 2024

DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts

arXiv:2408.05346v336 citationsh-index: 62
Originality Incremental advance
AI Analysis

This addresses the time-consuming and mentally taxing process of creating data stories for analysts and communicators, though it is incremental as it builds on existing LLM capabilities.

The paper tackles the problem of automating data-driven storytelling by introducing a novel task and benchmark with 1,449 stories, and proposes a multiagent LLM framework that generally outperforms non-agentic methods in evaluations.

Data-driven storytelling is a powerful method for conveying insights by combining narrative techniques with visualizations and text. These stories integrate visual aids, such as highlighted bars and lines in charts, along with textual annotations explaining insights. However, creating such stories requires a deep understanding of the data and meticulous narrative planning, often necessitating human intervention, which can be time-consuming and mentally taxing. While Large Language Models (LLMs) excel in various NLP tasks, their ability to generate coherent and comprehensive data stories remains underexplored. In this work, we introduce a novel task for data story generation and a benchmark containing 1,449 stories from diverse sources. To address the challenges of crafting coherent data stories, we propose a multiagent framework employing two LLM agents designed to replicate the human storytelling process: one for understanding and describing the data (Reflection), generating the outline, and narration, and another for verification at each intermediary step. While our agentic framework generally outperforms non-agentic counterparts in both model-based and human evaluations, the results also reveal unique challenges in data story generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes