OpenEDGAR: Open Source Software for SEC EDGAR Analysis
This provides a tool for academic and industrial researchers working with financial regulatory data, but it is incremental as it builds on existing frameworks like Django.
The authors tackled the problem of efficiently analyzing SEC EDGAR data by developing OpenEDGAR, an open-source Python framework that enables rapid construction of research databases with features for retrieval, parsing, and search of filing documents.
OpenEDGAR is an open source Python framework designed to rapidly construct research databases based on the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system operated by the US Securities and Exchange Commission (SEC). OpenEDGAR is built on the Django application framework, supports distributed compute across one or more servers, and includes functionality to (i) retrieve and parse index and filing data from EDGAR, (ii) build tables for key metadata like form type and filer, (iii) retrieve, parse, and update CIK to ticker and industry mappings, (iv) extract content and metadata from filing documents, and (v) search filing document contents. OpenEDGAR is designed for use in both academic research and industrial applications, and is distributed under MIT License at https://github.com/LexPredict/openedgar.