DLAILGMMNov 17, 2025

Moving Pictures of Thought: Extracting Visual Knowledge in Charles S. Peirce's Manuscripts with Vision-Language Models

arXiv:2511.13378v1h-index: 2Anthology of Computers and the Humanities
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of studying underexplored diagrams in scholarly documents for digital humanities researchers, though it is incremental as a preliminary study applying existing VLMs to new data.

The study tackled the problem of analyzing Charles S. Peirce's manuscripts, which combine text and complex diagrams, by using Vision-Language Models (VLMs) to identify and interpret hybrid pages, resulting in a workflow that segments layouts, prompts VLMs for knowledge extraction, and integrates captions into knowledge graphs.

Diagrams are crucial yet underexplored tools in many disciplines, demonstrating the close connection between visual representation and scholarly reasoning. However, their iconic form poses obstacles to visual studies, intermedial analysis, and text-based digital workflows. In particular, Charles S. Peirce consistently advocated the use of diagrams as essential for reasoning and explanation. His manuscripts, often combining textual content with complex visual artifacts, provide a challenging case for studying documents involving heterogeneous materials. In this preliminary study, we investigate whether Visual Language Models (VLMs) can effectively help us identify and interpret such hybrid pages in context. First, we propose a workflow that (i) segments manuscript page layouts, (ii) reconnects each segment to IIIF-compliant annotations, and (iii) submits fragments containing diagrams to a VLM. In addition, by adopting Peirce's semiotic framework, we designed prompts to extract key knowledge about diagrams and produce concise captions. Finally, we integrated these captions into knowledge graphs, enabling structured representations of diagrammatic content within composite sources.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes