Robust Knowledge Extraction from Large Language Models using Social Choice Theory
This addresses robustness issues for LLM applications in critical domains, but it is incremental as it builds on existing social choice methods.
The paper tackles the problem of inconsistent answers from large language models in high-stake domains like medicine by proposing a method that uses ranking queries and social choice theory to aggregate results, showing improved robustness in diagnostic settings.
Large-language models (LLMs) can support a wide range of applications like conversational agents, creative writing or general query answering. However, they are ill-suited for query answering in high-stake domains like medicine because they are typically not robust - even the same query can result in different answers when prompted multiple times. In order to improve the robustness of LLM queries, we propose using ranking queries repeatedly and to aggregate the queries using methods from social choice theory. We study ranking queries in diagnostic settings like medical and fault diagnosis and discuss how the Partial Borda Choice function from the literature can be applied to merge multiple query results. We discuss some additional interesting properties in our setting and evaluate the robustness of our approach empirically.