IR DLApr 18

Will AI be overconfident about academic research findings when reliant on abstracts? (v1)

arXiv:2605.2739263.2h-index: 10

Predicted impact top 50% in IR · last 90 daysOriginality Synthesis-oriented

AI Analysis

This highlights a specific risk for users of LLMs in academic knowledge discovery, though the finding is incremental as it extends known issues of LLM reliability.

The study tested whether LLMs relying on abstracts overestimate the strength of academic research claims. It found that claims in abstracts and conclusions are stronger than in discussions, especially outside social sciences and humanities, indicating a risk of overconfidence if LLMs use only abstracts.

Large Language Models (LLMs) like ChatGPT, DeepSeek and Gemini seem to be increasingly used for knowledge discovery, information retrieval, and knowledge summaries, including for academic topics. This can result in users being misled, such as due to hallucinations. These problems may be exacerbated for academic knowledge if LLMs base their answers on journal article abstracts when they lack full text access. To test whether the information content of abstracts can be misleading, full text articles were submitted to the GPT-OSS 120B, an LLM from OpenAI, asking it to assess separately the strength the claims for the main result in the abstract, discussion, and conclusion. Outside the social sciences and humanities, claims tended to be stronger in the abstract and conclusions than the discussion, suggesting that relying on the strength of claims in abstracts would be misleading. Thus, if LLMs ingest abstracts but not full texts, there is a risk that they will be overconfident about the findings and pass it on to users in response to relevant prompts. This is another reason to be cautious about using LLMs for academic-related knowledge discovery and summaries.

View on arXiv PDF

Similar