CLMar 18
Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing HallucinationCem Uluoglakci, Tugba Taskaya Temizel
Large language models (LLMs) often hallucinate, producing fluent but false information, partly because supervised fine-tuning (SFT) implicitly rewards always responding. We introduce $\textit{HypoTermInstruct}$, an SFT dataset (31,487 responses for 11,151 questions) designed to teach models epistemological humility-the ability to recognize the limits of their own knowledge and admit uncertainty. This is achieved through questions about non-existent "hypothetical" terms. We also release $\textit{HypoTermQA-Enhanced}$, a benchmark for hallucination tendency strengthened through multiple validations. We conducted 800 controlled LoRA SFT runs across $\textit{Llama3.1-8B}$ and $\textit{Gemma3-4B}$ (base and instruct), testing 100 fine-tuning configurations with paired controls. Our results demonstrate that replacing generic instruction data with $\textit{HypoTermInstruct}$ significantly improves the HypoTerm Score (median increases of 0.19% to 25.91%) and FactScore (+0.39% to +0.86%), while maintaining stable performance on MMLU (minimal decreases of 0.26% to 0.35%). Our work demonstrates that targeted, high-quality SFT data teaching meta-cognitive skills can effectively reduce hallucination without preference/RL pipelines, providing mechanistic insights and a practical path toward more reliable AI systems.
CLFeb 25, 2024
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMsCem Uluoglakci, Tugba Taskaya Temizel
Hallucinations pose a significant challenge to the reliability and alignment of Large Language Models (LLMs), limiting their widespread acceptance beyond chatbot applications. Despite ongoing efforts, hallucinations remain a prevalent challenge in LLMs. The detection of hallucinations itself is also a formidable task, frequently requiring manual labeling or constrained evaluations. This paper introduces an automated scalable framework that combines benchmarking LLMs' hallucination tendencies with efficient hallucination detection. We leverage LLMs to generate challenging tasks related to hypothetical phenomena, subsequently employing them as agents for efficient hallucination detection. The framework is domain-agnostic, allowing the use of any language model for benchmark creation or evaluation in any domain. We introduce the publicly available HypoTermQA Benchmarking Dataset, on which state-of-the-art models' performance ranged between 3% and 11%, and evaluator agents demonstrated a 6% error rate in hallucination prediction. The proposed framework provides opportunities to test and improve LLMs. Additionally, it has the potential to generate benchmarking datasets tailored to specific domains, such as law, health, and finance.
CVMar 28, 2018
The Effects of JPEG and JPEG2000 Compression on Attacks using Adversarial ExamplesAyse Elvan Aydemir, Alptekin Temizel, Tugba Taskaya Temizel
Adversarial examples are known to have a negative effect on the performance of classifiers which have otherwise good performance on undisturbed images. These examples are generated by adding non-random noise to the testing samples in order to make classifier misclassify the given data. Adversarial attacks use these intentionally generated examples and they pose a security risk to the machine learning based systems. To be immune to such attacks, it is desirable to have a pre-processing mechanism which removes these effects causing misclassification while keeping the content of the image. JPEG and JPEG2000 are well-known image compression techniques which suppress the high-frequency content taking the human visual system into account. JPEG has been also shown to be an effective method for reducing adversarial noise. In this paper, we propose applying JPEG2000 compression as an alternative and systematically compare the classification performance of adversarial images compressed using JPEG and JPEG2000 at different target PSNR values and maximum compression levels. Our experiments show that JPEG2000 is more effective in reducing adversarial noise as it allows higher compression rates with less distortion and it does not introduce blocking artifacts.