CLAug 7, 2024

PAGED: A Benchmark for Procedural Graphs Extraction from Documents

arXiv:2408.03630v226 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of creating visual graphs from complex procedures for users who need to skim documents, but it is incremental as it primarily introduces a benchmark and evaluates existing methods.

The authors tackled the problem of automatically extracting procedural graphs from documents by creating a new benchmark called PAGED, which includes a large high-quality dataset and standard evaluations. They found that five state-of-the-art baselines performed poorly due to reliance on hand-written rules and limited data, while three advanced LLMs with a novel self-refine strategy showed advantages in identifying textual elements but gaps in building logical structures.

Automatic extraction of procedural graphs from documents creates a low-cost way for users to easily understand a complex procedure by skimming visual graphs. Despite the progress in recent studies, it remains unanswered: whether the existing studies have well solved this task (Q1) and whether the emerging large language models (LLMs) can bring new opportunities to this task (Q2). To this end, we propose a new benchmark PAGED, equipped with a large high-quality dataset and standard evaluations. It investigates five state-of-the-art baselines, revealing that they fail to extract optimal procedural graphs well because of their heavy reliance on hand-written rules and limited available data. We further involve three advanced LLMs in PAGED and enhance them with a novel self-refine strategy. The results point out the advantages of LLMs in identifying textual elements and their gaps in building logical structures. We hope PAGED can serve as a major landmark for automatic procedural graph extraction and the investigations in PAGED can offer insights into the research on logic reasoning among non-sequential elements.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes