CLJun 27, 2025
The Consistency Hypothesis in Uncertainty Quantification for Large Language ModelsQuan Xiao, Debarun Bhattacharjya, Balaji Ganesan et al.
Estimating the confidence of large language model (LLM) outputs is essential for real-world applications requiring high user trust. Black-box uncertainty quantification (UQ) methods, relying solely on model API access, have gained popularity due to their practical benefits. In this paper, we examine the implicit assumption behind several UQ methods, which use generation consistency as a proxy for confidence, an idea we formalize as the consistency hypothesis. We introduce three mathematical statements with corresponding statistical tests to capture variations of this hypothesis and metrics to evaluate LLM output conformity across tasks. Our empirical investigation, spanning 8 benchmark datasets and 3 tasks (question answering, text summarization, and text-to-SQL), highlights the prevalence of the hypothesis under different settings. Among the statements, we highlight the `Sim-Any' hypothesis as the most actionable, and demonstrate how it can be leveraged by proposing data-free black-box UQ methods that aggregate similarities between generations for confidence estimation. These approaches can outperform the closest baselines, showcasing the practical value of the empirically observed consistency hypothesis.
CLOct 10, 2025
SIMBA UQ: Similarity-Based Aggregation for Uncertainty Quantification in Large Language ModelsDebarun Bhattacharjya, Balaji Ganesan, Junkyu Lee et al.
When does a large language model (LLM) know what it does not know? Uncertainty quantification (UQ) provides measures of uncertainty, such as an estimate of the confidence in an LLM's generated output, and is therefore increasingly recognized as a crucial component of trusted AI systems. Black-box UQ methods do not require access to internal model information from the generating LLM and therefore have numerous real-world advantages, such as robustness to system changes, adaptability to choice of LLM, reduced costs, and computational tractability. In this paper, we investigate the effectiveness of UQ techniques that are primarily but not necessarily entirely black-box, where the consistency between a generated output and other sampled generations is used as a proxy for confidence in its correctness. We propose a high-level non-verbalized similarity-based aggregation framework that subsumes a broad swath of UQ approaches suitable for complex generative tasks, as well as introduce specific novel techniques from the framework that train confidence estimation models using small training sets. Through an empirical study with datasets spanning the diverse tasks of question answering, summarization, and text-to-SQL, we demonstrate that our proposed similarity-based methods can yield better calibrated confidences than baselines.
AIMay 8, 2021
Business Entity Matching with Siamese Graph Convolutional NetworksEvgeny Krivosheev, Mattia Atzeni, Katsiaryna Mirylenka et al.
Data integration has been studied extensively for decades and approached from different angles. However, this domain still remains largely rule-driven and lacks universal automation. Recent developments in machine learning and in particular deep learning have opened the way to more general and efficient solutions to data-integration tasks. In this paper, we demonstrate an approach that allows modeling and integrating entities by leveraging their relations and contextual information. This is achieved by combining siamese and graph neural networks to effectively propagate information between connected entities and support high scalability. We evaluated our approach on the task of integrating data about business entities, demonstrating that it outperforms both traditional rule-based systems and other deep learning approaches.
DBJan 17, 2020
Siamese Graph Neural Networks for Data IntegrationEvgeny Krivosheev, Mattia Atzeni, Katsiaryna Mirylenka et al.
Data integration has been studied extensively for decades and approached from different angles. However, this domain still remains largely rule-driven and lacks universal automation. Recent development in machine learning and in particular deep learning has opened the way to more general and more efficient solutions to data integration problems. In this work, we propose a general approach to modeling and integrating entities from structured data, such as relational databases, as well as unstructured sources, such as free text from news articles. Our approach is designed to explicitly model and leverage relations between entities, thereby using all available information and preserving as much context as possible. This is achieved by combining siamese and graph neural networks to propagate information between connected entities and support high scalability. We evaluate our method on the task of integrating data about business entities, and we demonstrate that it outperforms standard rule-based systems, as well as other deep learning approaches that do not use graph-based representations.
DBJul 19, 2019
Fast Record Linkage for Company EntitiesThomas Gschwind, Christoph Miksovic, Julian Minder et al.
Record linkage is an essential part of nearly all real-world systems that consume structured and unstructured data coming from different sources. Typically no common key is available for connecting records. Massive data cleaning and data integration processes often have to be completed before any data analytics and further processing can be performed. Although record linkage is frequently regarded as a somewhat tedious but necessary step, it reveals valuable insights into the data at hand. These insights guide further analytic approaches to the data and support data visualization. In this work we focus on company entity matching, where company name, location and industry are taken into account. Our contribution is an end-to-end, highly scalable, enterprise-grade system that uses rule-based linkage algorithms extended with a machine learning approach to account for short company names. Linkage time is greatly reduced by efficient decomposition of the search space using MinHash. High linkage accuracy is achieved by the proposed thorough scoring process of the matching candidates. Based on real-world ground truth datasets, we show that our approach reaches a recall of 91% compared to 73% for baseline approaches. These results are achieved while scaling linearly with the number of nodes used in the system.