Threshold-Based Retrieval and Textual Entailment Detection on Legal Bar Exam Questions
This work addresses the challenge of legal question answering for practitioners needing to navigate international legal domains, but it appears incremental as it builds on existing methods like BM25 and deep learning classifiers.
The paper tackled the problem of automatically retrieving relevant legal texts and determining textual entailment for legal bar exam questions by combining BM25 scoring with word embeddings and using threshold-based criteria for document selection and answer inclusion. The approach showed benefits over the baseline in the textual entailment task, though no concrete numbers were provided.
Getting an overview over the legal domain has become challenging, especially in a broad, international context. Legal question answering systems have the potential to alleviate this task by automatically retrieving relevant legal texts for a specific statement and checking whether the meaning of the statement can be inferred from the found documents. We investigate a combination of the BM25 scoring method of Elasticsearch with word embeddings trained on English translations of the German and Japanese civil law. For this, we define criteria which select a dynamic number of relevant documents according to threshold scores. Exploiting two deep learning classifiers and their respective prediction bias with a threshold-based answer inclusion criterion has shown to be beneficial for the textual entailment task, when compared to the baseline.