CLAug 30, 2024
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding DataSpencer Whitehead, Jacob Phillips, Sean Hendryx
Multimodal language models can exhibit hallucinations in their outputs, which limits their reliability. The ability to automatically detect these errors is important for mitigating them, but has been less explored and existing efforts do not localize hallucinations, instead framing this as a classification task. In this work, we first pose multimodal hallucination detection as a sequence labeling task where models must localize hallucinated text spans and present a strong baseline model. Given the high cost of human annotations for this task, we propose an approach to improve the sample efficiency of these models by creating corrupted grounding data, which we use for pre-training. Leveraging phrase grounding data, we generate hallucinations to replace grounded spans and create hallucinated text. Experiments show that pre-training on this data improves sample efficiency when fine-tuning, and that the learning signal from the grounding data plays an important role in these improvements.
CVJan 22, 2024
Out-of-Distribution Detection & Applications With Ablated Learned Temperature EnergyWill LeVine, Benjamin Pikus, Jacob Phillips et al.
As deep neural networks become adopted in high-stakes domains, it is crucial to identify when inference inputs are Out-of-Distribution (OOD) so that users can be alerted of likely drops in performance and calibration despite high confidence -- ultimately to know when networks' decisions (and their uncertainty in those decisions) should be trusted. In this paper we introduce Ablated Learned Temperature Energy (or "AbeT" for short), an OOD detection method which lowers the False Positive Rate at 95\% True Positive Rate (FPR@95) by $43.43\%$ in classification compared to state of the art without training networks in multiple stages or requiring hyperparameters or test-time backward passes. We additionally provide empirical insights as to why our model learns to distinguish between In-Distribution (ID) and OOD samples while only being explicitly trained on ID samples via exposure to misclassified ID examples at training time. Lastly, we show the efficacy of our method in identifying predicted bounding boxes and pixels corresponding to OOD objects in object detection and semantic segmentation, respectively -- with an AUROC increase of $5.15\%$ in object detection and both a decrease in FPR@95 of $41.48\%$ and an increase in AUPRC of $34.20\%$ in semantic segmentation compared to previous state of the art.