OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs
This addresses the need for standardized tools to compare and improve factuality evaluation in LLMs, which is crucial for real-world applications, though it is incremental as it builds on existing research by providing a unified framework.
The paper tackles the problem of evaluating the factual accuracy of large language models (LLMs) by introducing OpenFactCheck, a unified framework with modules for customizing fact-checking systems, assessing LLM factuality, and evaluating the checkers themselves, resulting in an open-sourced tool available as a library and web service.
The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures, which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (https://github.com/mbzuai-nlp/openfactcheck) and publicly released as a Python library (https://pypi.org/project/openfactcheck/) and also as a web service (http://app.openfactcheck.com). A video describing the system is available at https://youtu.be/-i9VKL0HleI.