HC CLFeb 4, 2025

ReSpark: Leveraging Previous Data Reports as References to Generate New Reports with LLMs

Yuan Tian, Chuhan Zhang, Xiaotong Wang, Sitong Pan, Weiwei Cui, Haidong Zhang, Dazhen Deng, Yingcai Wu

arXiv:2502.02329v37.21 citationsh-index: 14UIST

Originality Incremental advance

AI Analysis

This addresses the problem of manual report creation for data analysts, though it is incremental as it builds on existing LLM capabilities.

The paper tackles the labor-intensive task of creating data reports by introducing ReSpark, a system that uses LLMs to reverse-engineer analysis logic from existing reports and adapt it to new datasets, with evaluations showing it effectively lowers the barrier to report generation.

Creating data reports is a labor-intensive task involving iterative data exploration, insight extraction, and narrative construction. A key challenge lies in composing the analysis logic-from defining objectives and transforming data to identifying and communicating insights. Manually crafting this logic can be cognitively demanding. While experienced analysts often reuse scripts from past projects, finding a perfect match for a new dataset is rare. Even when similar analyses are available online, they usually share only results or visualizations, not the underlying code, making reuse difficult. To address this, we present ReSpark, a system that leverages large language models (LLMs) to reverse-engineer analysis logic from existing reports and adapt it to new datasets. By generating draft analysis steps, ReSpark provides a warm start for users. It also supports interactive refinement, allowing users to inspect intermediate outputs, insert objectives, and revise content. We evaluate ReSpark through comparative and user studies, demonstrating its effectiveness in lowering the barrier to generating data reports without relying on existing analysis code.

View on arXiv PDF

Similar