Preacher: Paper-to-Video Agentic System
This addresses the need for accessible video summaries of research papers, offering a novel approach beyond existing video generation models, though it appears incremental in improving upon current methods.
The authors tackled the problem of converting research papers into structured video abstracts by introducing Preacher, an agentic system that decomposes and summarizes papers before generating coherent videos, achieving high-quality results across five research fields.
The paper-to-video task converts a research paper into a structured video abstract, distilling key concepts, methods, and conclusions into an accessible, well-organized format. While state-of-the-art video generation models demonstrate potential, they are constrained by limited context windows, rigid video duration constraints, limited stylistic diversity, and an inability to represent domain-specific knowledge. To address these limitations, we introduce Preacher, the first paper-to-video agentic system. Preacher employs a topdown approach to decompose, summarize, and reformulate the paper, followed by bottom-up video generation, synthesizing diverse video segments into a coherent abstract. To align cross-modal representations, we define key scenes and introduce a Progressive Chain of Thought (P-CoT) for granular, iterative planning. Preacher successfully generates high-quality video abstracts across five research fields, demonstrating expertise beyond current video generation models. Code will be released at: https://github.com/Gen-Verse/Paper2Video