appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit
This provides a flexible conversion tool for researchers and developers handling academic papers, but it is incremental as it builds on existing PDF parsing methods.
The authors tackled the problem of converting academic paper PDFs to JSON by developing appjsonify, a Python toolkit that uses visual-based layout analysis and rule-based text processing, resulting in a publicly released, configurable tool available on PyPI and GitHub.
We present appjsonify, a Python-based PDF-to-JSON conversion toolkit for academic papers. It parses a PDF file using several visual-based document layout analysis models and rule-based text processing approaches. appjsonify is a flexible tool that allows users to easily configure the processing pipeline to handle a specific format of a paper they wish to process. We are publicly releasing appjsonify as an easy-to-install toolkit available via PyPI and GitHub.