CL AIApr 15, 2021

Sublanguage: A Serious Issue Affects Pretrained Models in Legal Domain

arXiv:2104.07782v20.2

Originality Synthesis-oriented

AI Analysis

This tackles the issue of inaccurate AI applications in the legal domain, but it is incremental as it adapts an existing method to a specific domain.

The paper addresses the problem of pretrained models failing to understand legal English, a specialized sublanguage, by introducing BERTLaw, a legal sublanguage pretrained model. Experiments show it outperforms baseline pretrained models, though no concrete numbers are provided.

Legal English is a sublanguage that is important for everyone but not for everyone to understand. Pretrained models have become best practices among current deep learning approaches for different problems. It would be a waste or even a danger if these models were applied in practice without knowledge of the sublanguage of the law. In this paper, we raise the issue and propose a trivial solution by introducing BERTLaw a legal sublanguage pretrained model. The paper's experiments demonstrate the superior effectiveness of the method compared to the baseline pretrained model

View on arXiv PDF

Similar