HCDec 2, 2025
Humans incorrectly reject confident accusatory AI judgmentsRiccardo Loconte, Merylin Monaro, Pietro Pietrini et al.
Automated verbal deception detection using methods from Artificial Intelligence (AI) has been shown to outperform humans in disentangling lies from truths. Research suggests that transparency and interpretability of computational methods tend to increase human acceptance of using AI to support decisions. However, the extent to which humans accept AI judgments for deception detection remains unclear. We experimentally examined how an AI model's accuracy (i.e., its overall performance in deception detection) and confidence (i.e., the model's uncertainty in single-statements predictions) influence human adoption of the model's judgments. Participants (n=373) were presented with veracity judgments of an AI model with high or low overall accuracy and various degrees of prediction confidence. The results showed that humans followed predictions from a highly accurate model more than from a less accurate one. Interestingly, the more confident the model, the more people deviated from it, especially if the model predicted deception. We also found that human interaction with algorithmic predictions either worsened the machine's performance or was ineffective. While this human aversion to accept highly confident algorithmic predictions was partly explained by participants' tendency to overestimate humans' deception detection abilities, we also discuss how truth-default theory and the social costs of accusing someone of lying help explain the findings.
CLJan 13, 2025
When lies are mostly truthful: automated verbal deception detection for embedded liesRiccardo Loconte, Bennett Kleinberg
Background: Verbal deception detection research relies on narratives and commonly assumes statements as truthful or deceptive. A more realistic perspective acknowledges that the veracity of statements exists on a continuum with truthful and deceptive parts being embedded within the same statement. However, research on embedded lies has been lagging behind. Methods: We collected a novel dataset of 2,088 truthful and deceptive statements with annotated embedded lies. Using a within-subjects design, participants provided a truthful account of an autobiographical event. They then rewrote their statement in a deceptive manner by including embedded lies, which they highlighted afterwards and judged on lie centrality, deceptiveness, and source. Results: We show that a fined-tuned language model (Llama-3-8B) can classify truthful statements and those containing embedded lies with 64% accuracy. Individual differences, linguistic properties and explainability analysis suggest that the challenge of moving the dial towards embedded lies stems from their resemblance to truthful statements. Typical deceptive statements consisted of 2/3 truthful information and 1/3 embedded lies, largely derived from past personal experiences and with minimal linguistic differences with their truthful counterparts. Conclusion: We present this dataset as a novel resource to address this challenge and foster research on embedded lies in verbal deception detection.
CLJan 10, 2025
Effective faking of verbal deception detection with target-aligned adversarial attacksBennett Kleinberg, Riccardo Loconte, Bruno Verschuere
Background: Deception detection through analysing language is a promising avenue using both human judgments and automated machine learning judgments. For both forms of credibility assessment, automated adversarial attacks that rewrite deceptive statements to appear truthful pose a serious threat. Methods: We used a dataset of 243 truthful and 262 fabricated autobiographical stories in a deception detection task for humans and machine learning models. A large language model was tasked to rewrite deceptive statements so that they appear truthful. In Study 1, humans who made a deception judgment or used the detailedness heuristic and two machine learning models (a fine-tuned language model and a simple n-gram model) judged original or adversarial modifications of deceptive statements. In Study 2, we manipulated the target alignment of the modifications, i.e. tailoring the attack to whether the statements would be assessed by humans or computer models. Results: When adversarial modifications were aligned with their target, human (d=-0.07 and d=-0.04) and machine judgments (51% accuracy) dropped to the chance level. When the attack was not aligned with the target, both human heuristics judgments (d=0.30 and d=0.36) and machine learning predictions (63-78%) were significantly better than chance. Conclusions: Easily accessible language models can effectively help anyone fake deception detection efforts both by humans and machine learning models. Robustness against adversarial modifications for humans and machines depends on that target alignment. We close with suggestions on advancing deception research with adversarial attack designs and techniques.