CL IR MLJun 10, 2018

LexNLP: Natural language processing and information extraction for legal and regulatory texts

Michael J Bommarito, Daniel Martin Katz, Eric M Detterman

arXiv:1806.03688v13.6104 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a domain-specific tool for researchers and practitioners in legal and regulatory fields, but it is incremental as it builds on existing NLP methods.

The authors tackled the problem of processing legal and regulatory texts by developing LexNLP, an open-source Python package that provides tools for segmentation, information extraction, and model building, with pre-trained models based on thousands of unit tests from real documents like the SEC EDGAR database.

LexNLP is an open source Python package focused on natural language processing and machine learning for legal and regulatory text. The package includes functionality to (i) segment documents, (ii) identify key text such as titles and section headings, (iii) extract over eighteen types of structured information like distances and dates, (iv) extract named entities such as companies and geopolitical entities, (v) transform text into features for model training, and (vi) build unsupervised and supervised models such as word embedding or tagging models. LexNLP includes pre-trained models based on thousands of unit tests drawn from real documents available from the SEC EDGAR database as well as various judicial and regulatory proceedings. LexNLP is designed for use in both academic research and industrial applications, and is distributed at https://github.com/LexPredict/lexpredict-lexnlp.

View on arXiv PDF Code

Similar