Towards Trustable Language Models: Investigating Information Quality of Large Language Models
This work addresses the critical issue of information trustworthiness in LLMs for users and businesses, though it appears incremental as it builds on existing concerns about model reliability.
The paper tackles the problem of untrustworthy information generated by large language models (LLMs) due to issues like unreliable tokenization and bias, which can lead to hallucinations and flawed business decisions. It introduces a novel mathematical framework for evaluating information quality in LLMs and analyzes scaling laws to systematically improve model reliability.
Large language models (LLM) are generating information at a rapid pace, requiring users to increasingly rely and trust the data. Despite remarkable advances of LLM, Information generated by LLM is not completely trustworthy, due to challenges in information quality. Specifically, integrity of Information quality decreases due to unreliable, biased, tokenization during pre-training of LLM. Moreover, due to decreased information quality issues, has led towards hallucination, fabricated information. Unreliable information can lead towards flawed decisions in businesses, which impacts economic activity. In this work, we introduce novel mathematical information quality evaluation of LLM, we furthermore analyze and highlight information quality challenges, scaling laws to systematically scale language models.