CLCYITSEOct 2, 2021

Simplify Your Law: Using Information Theory to Deduplicate Legal Documents

arXiv:2110.00735v12 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of making legal texts more comprehensible and maintainable for legal professionals and the public, though it is incremental by applying existing information theory principles to a new domain.

The paper tackles the problem of textual redundancy in legal documents by introducing the duplicated phrase detection problem and proposing the Dupex algorithm, which uses the Minimum Description Length principle to identify and compress duplicated phrases, showing it works well in practice on the Titles of the United States Code.

Textual redundancy is one of the main challenges to ensuring that legal texts remain comprehensible and maintainable. Drawing inspiration from the refactoring literature in software engineering, which has developed methods to expose and eliminate duplicated code, we introduce the duplicated phrase detection problem for legal texts and propose the Dupex algorithm to solve it. Leveraging the Minimum Description Length principle from information theory, Dupex identifies a set of duplicated phrases, called patterns, that together best compress a given input text. Through an extensive set of experiments on the Titles of the United States Code, we confirm that our algorithm works well in practice: Dupex will help you simplify your law.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes