CLAIJan 24, 2024

Can AI Assistants Know What They Don't Know?

arXiv:2401.13275v249 citationsICML
Originality Incremental advance
AI Analysis

This addresses the risk of untruthful responses in AI assistants, though it is incremental as it builds on existing datasets and alignment methods.

The paper tackles the problem of AI assistants making factual errors in knowledge-intensive tasks by investigating whether they can learn to refuse to answer unknown questions, and finds that after alignment with a custom dataset, they refuse most unknown questions and show higher accuracy on attempted answers.

Recently, AI assistants based on large language models (LLMs) show surprising performance in many tasks, such as dialogue, solving math problems, writing code, and using tools. Although LLMs possess intensive world knowledge, they still make factual errors when facing some knowledge intensive tasks, like open-domain question answering. These untruthful responses from the AI assistant may cause significant risks in practical applications. We believe that an AI assistant's refusal to answer questions it does not know is a crucial method for reducing hallucinations and making the assistant truthful. Therefore, in this paper, we ask the question "Can AI assistants know what they don't know and express them through natural language?" To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets. Then we align the assistant with its corresponding Idk dataset and observe whether it can refuse to answer its unknown questions after alignment. Experimental results show that after alignment with Idk datasets, the assistant can refuse to answer most its unknown questions. For questions they attempt to answer, the accuracy is significantly higher than before the alignment.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes