CLAIOct 2, 2023

appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit

arXiv:2310.01206v21 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This provides a flexible conversion tool for researchers and developers handling academic papers, but it is incremental as it builds on existing PDF parsing methods.

The authors tackled the problem of converting academic paper PDFs to JSON by developing appjsonify, a Python toolkit that uses visual-based layout analysis and rule-based text processing, resulting in a publicly released, configurable tool available on PyPI and GitHub.

We present appjsonify, a Python-based PDF-to-JSON conversion toolkit for academic papers. It parses a PDF file using several visual-based document layout analysis models and rule-based text processing approaches. appjsonify is a flexible tool that allows users to easily configure the processing pipeline to handle a specific format of a paper they wish to process. We are publicly releasing appjsonify as an easy-to-install toolkit available via PyPI and GitHub.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes