CLMay 12, 2023

LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development

arXiv:2305.07507v2248 citations
Originality Synthesis-oriented
AI Analysis

This work provides tools and insights for developing domain-specific language models in the legal field, though it is incremental as it builds on existing methods for analyzing and benchmarking models.

The authors analyzed how pre-trained language models perform in legal contexts, finding that probing performance correlates with upstream performance on related topics, while downstream performance depends on model size and prior legal knowledge. They released a legal corpus (LeXFiles) and a benchmark (LegalLAMA) to support this analysis.

In this work, we conduct a detailed analysis on the performance of legal-oriented pre-trained language models (PLMs). We examine the interplay between their original objective, acquired knowledge, and legal language understanding capacities which we define as the upstream, probing, and downstream performance, respectively. We consider not only the models' size but also the pre-training corpora used as important dimensions in our study. To this end, we release a multinational English legal corpus (LeXFiles) and a legal knowledge probing benchmark (LegalLAMA) to facilitate training and detailed analysis of legal-oriented PLMs. We release two new legal PLMs trained on LeXFiles and evaluate them alongside others on LegalLAMA and LexGLUE. We find that probing performance strongly correlates with upstream performance in related legal topics. On the other hand, downstream performance is mainly driven by the model's size and prior legal knowledge which can be estimated by upstream and probing performance. Based on these findings, we can conclude that both dimensions are important for those seeking the development of domain-specific PLMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes