CLOct 15, 2025

Assessing Web Search Credibility and Response Groundedness in Chat Assistants

Ivan Vykopal, Matúš Pikuliak, Simon Ostermann, Marián Šimko

arXiv:2510.13749v12.7h-index: 10

Originality Incremental advance

AI Analysis

This work addresses the risk of misinformation in AI chat assistants for users in high-stakes information environments, providing a systematic evaluation framework.

The paper tackled the problem of misinformation amplification in chat assistants with web search by evaluating source credibility and response groundedness across four assistants using 100 claims on misinformation-prone topics, finding Perplexity achieved the highest source credibility while GPT-4o cited more non-credible sources on sensitive topics.

Chat assistants increasingly integrate web search functionality, enabling them to retrieve and cite external sources. While this promises more reliable answers, it also raises the risk of amplifying misinformation from low-credibility sources. In this paper, we introduce a novel methodology for evaluating assistants' web search behavior, focusing on source credibility and the groundedness of responses with respect to cited sources. Using 100 claims across five misinformation-prone topics, we assess GPT-4o, GPT-5, Perplexity, and Qwen Chat. Our findings reveal differences between the assistants, with Perplexity achieving the highest source credibility, whereas GPT-4o exhibits elevated citation of non-credibility sources on sensitive topics. This work provides the first systematic comparison of commonly used chat assistants for fact-checking behavior, offering a foundation for evaluating AI systems in high-stakes information environments.

View on arXiv PDF

Similar