The Legal Argument Reasoning Task in Civil Procedure
This work addresses a complex benchmarking problem for legal language models, but it is incremental as it builds on existing transformer methods without achieving significant breakthroughs.
The authors introduced a new NLP task and dataset for legal argument reasoning in U.S. civil procedure, based on a law student book, and found that fine-tuning a legal transformer model offers some improvement over random baselines, but the task remains challenging with no concrete performance numbers provided.
We present a new NLP task and dataset from the domain of the U.S. civil procedure. Each instance of the dataset consists of a general introduction to the case, a particular question, and a possible solution argument, accompanied by a detailed analysis of why the argument applies in that case. Since the dataset is based on a book aimed at law students, we believe that it represents a truly complex task for benchmarking modern legal language models. Our baseline evaluation shows that fine-tuning a legal transformer provides some advantage over random baseline models, but our analysis reveals that the actual ability to infer legal arguments remains a challenging open research question.