A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering
This addresses the challenge of applying legislation to real-world cases for legal professionals, but it is incremental as it focuses on dataset creation and benchmarking.
The authors tackled the problem of computational statutory reasoning by introducing a dataset for tax law entailment and QA, finding that standard machine reading models performed poorly even with domain fine-tuning, while a hand-built Prolog system fully solved the task.
Legislation can be viewed as a body of prescriptive rules expressed in natural language. The application of legislation to facts of a case we refer to as statutory reasoning, where those facts are also expressed in natural language. Computational statutory reasoning is distinct from most existing work in machine reading, in that much of the information needed for deciding a case is declared exactly once (a law), while the information needed in much of machine reading tends to be learned through distributional language statistics. To investigate the performance of natural language understanding approaches on statutory reasoning, we introduce a dataset, together with a legal-domain text corpus. Straightforward application of machine reading models exhibits low out-of-the-box performance on our questions, whether or not they have been fine-tuned to the legal domain. We contrast this with a hand-constructed Prolog-based system, designed to fully solve the task. These experiments support a discussion of the challenges facing statutory reasoning moving forward, which we argue is an interesting real-world task that can motivate the development of models able to utilize prescriptive rules specified in natural language.