Sublanguage: A Serious Issue Affects Pretrained Models in Legal Domain
This tackles the issue of inaccurate AI applications in the legal domain, but it is incremental as it adapts an existing method to a specific domain.
The paper addresses the problem of pretrained models failing to understand legal English, a specialized sublanguage, by introducing BERTLaw, a legal sublanguage pretrained model. Experiments show it outperforms baseline pretrained models, though no concrete numbers are provided.
Legal English is a sublanguage that is important for everyone but not for everyone to understand. Pretrained models have become best practices among current deep learning approaches for different problems. It would be a waste or even a danger if these models were applied in practice without knowledge of the sublanguage of the law. In this paper, we raise the issue and propose a trivial solution by introducing BERTLaw a legal sublanguage pretrained model. The paper's experiments demonstrate the superior effectiveness of the method compared to the baseline pretrained model