Hallucination is the last thing you need
This addresses the critical issue of AI hallucination in the legal domain, where accuracy is essential, but it appears incremental as it builds on existing ensemble and tokenization methods.
The paper tackles the problem of hallucination in generative AI for legal applications by proposing an ensemble of three LLMs focused on understanding, experience, and facts, and introduces multi-length tokenization to protect key legal information, achieving interesting results in reducing hallucination.
The legal profession necessitates a multidimensional approach that involves synthesizing an in-depth comprehension of a legal issue with insightful commentary based on personal experience, combined with a comprehensive understanding of pertinent legislation, regulation, and case law, in order to deliver an informed legal solution. The present offering with generative AI presents major obstacles in replicating this, as current models struggle to integrate and navigate such a complex interplay of understanding, experience, and fact-checking procedures. It is noteworthy that where generative AI outputs understanding and experience, which reflect the aggregate of various subjective views on similar topics, this often deflects the model's attention from the crucial legal facts, thereby resulting in hallucination. Hence, this paper delves into the feasibility of three independent LLMs, each focused on understanding, experience, and facts, synthesising as one single ensemble model to effectively counteract the current challenges posed by the existing monolithic generative AI models. We introduce an idea of mutli-length tokenisation to protect key information assets like common law judgements, and finally we interrogate the most advanced publicly available models for legal hallucination, with some interesting results.