CLJul 10, 2025

TruthTorchLM: A Comprehensive Library for Predicting Truthfulness in LLM Outputs

arXiv:2507.08203v14 citationsh-index: 53Has CodeEMNLP
Originality Synthesis-oriented
AI Analysis

This provides a comprehensive tool for researchers and practitioners to assess LLM truthfulness, though it is incremental as it builds on existing methods.

The authors tackled the problem of predicting truthfulness in LLM outputs by introducing TruthTorchLM, an open-source library with over 30 methods, and evaluated it on datasets like TriviaQA and GSM8K, showing competitive performance.

Generative Large Language Models (LLMs)inevitably produce untruthful responses. Accurately predicting the truthfulness of these outputs is critical, especially in high-stakes settings. To accelerate research in this domain and make truthfulness prediction methods more accessible, we introduce TruthTorchLM an open-source, comprehensive Python library featuring over 30 truthfulness prediction methods, which we refer to as Truth Methods. Unlike existing toolkits such as Guardrails, which focus solely on document-grounded verification, or LM-Polygraph, which is limited to uncertainty-based methods, TruthTorchLM offers a broad and extensible collection of techniques. These methods span diverse tradeoffs in computational cost, access level (e.g., black-box vs white-box), grounding document requirements, and supervision type (self-supervised or supervised). TruthTorchLM is seamlessly compatible with both HuggingFace and LiteLLM, enabling support for locally hosted and API-based models. It also provides a unified interface for generation, evaluation, calibration, and long-form truthfulness prediction, along with a flexible framework for extending the library with new methods. We conduct an evaluation of representative truth methods on three datasets, TriviaQA, GSM8K, and FactScore-Bio. The code is available at https://github.com/Ybakman/TruthTorchLM

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes