CLJan 11, 2025

ACORD: An Expert-Annotated Retrieval Dataset for Legal Contract Drafting

Steven H. Wang, Maksim Zubkov, Kexin Fan, Sarah Harrell, Yuyang Sun, Wei Chen, Andreas Plesner, Roger Wattenhofer

ETH Zurich

arXiv:2501.06582v414.714 citationsh-index: 24ACL

Originality Synthesis-oriented

AI Analysis

This addresses the need for reliable contract drafting tools for lawyers, though it is incremental as it focuses on creating a new benchmark dataset.

The authors tackled the problem of contract clause retrieval for legal drafting by introducing ACORD, the first expert-annotated retrieval dataset, which includes 114 queries and over 126,000 query-clause pairs ranked on a 1-5 star scale, with results showing promising performance from bi-encoder retrievers and LLM re-rankers but indicating substantial room for improvement.

Information retrieval, specifically contract clause retrieval, is foundational to contract drafting because lawyers rarely draft contracts from scratch; instead, they locate and revise the most relevant precedent. We introduce the Atticus Clause Retrieval Dataset (ACORD), the first retrieval benchmark for contract drafting fully annotated by experts. ACORD focuses on complex contract clauses such as Limitation of Liability, Indemnification, Change of Control, and Most Favored Nation. It includes 114 queries and over 126,000 query-clause pairs, each ranked on a scale from 1 to 5 stars. The task is to find the most relevant precedent clauses to a query. The bi-encoder retriever paired with pointwise LLMs re-rankers shows promising results. However, substantial improvements are still needed to effectively manage the complex legal work typically undertaken by lawyers. As the first retrieval benchmark for contract drafting annotated by experts, ACORD can serve as a valuable IR benchmark for the NLP community.

View on arXiv PDF

Similar