HC AIOct 1, 2025

From keywords to semantics: Perceptions of large language models in data discovery

Maura E Halstead, Mark A. Green, Caroline Jay, Richard Kingston, David Topping, Alexander Singleton

arXiv:2510.01473v14.1h-index: 2

Originality Synthesis-oriented

AI Analysis

This addresses data discovery challenges for researchers, but it is incremental as it focuses on user perceptions rather than technical innovation.

The study tackled the problem of data discovery by investigating whether researchers would accept large language models (LLMs) to replace keyword-based methods, finding that potential benefits alone are insufficient due to barriers, but transparency features could increase acceptance.

Current approaches to data discovery match keywords between metadata and queries. This matching requires researchers to know the exact wording that other researchers previously used, creating a challenging process that could lead to missing relevant data. Large Language Models (LLMs) could enhance data discovery by removing this requirement and allowing researchers to ask questions with natural language. However, we do not currently know if researchers would accept LLMs for data discovery. Using a human-centered artificial intelligence (HCAI) focus, we ran focus groups (N = 27) to understand researchers' perspectives towards LLMs for data discovery. Our conceptual model shows that the potential benefits are not enough for researchers to use LLMs instead of current technology. Barriers prevent researchers from fully accepting LLMs, but features around transparency could overcome them. Using our model will allow developers to incorporate features that result in an increased acceptance of LLMs for data discovery.

View on arXiv PDF

Similar