HCAICLCYMar 24, 2025

REALM: A Dataset of Real-World LLM Use Cases

arXiv:2503.18792v211 citationsh-index: 7ACL
Originality Synthesis-oriented
AI Analysis

This dataset addresses the problem of limited real-world data on LLM applications for researchers and practitioners, though it is incremental as it focuses on data collection rather than novel methods.

The authors tackled the lack of comprehensive understanding of real-world LLM applications by introducing REALM, a dataset of over 94,000 use cases from Reddit and news articles, which categorizes applications and explores user demographics to provide insights into adoption across domains.

Large Language Models (LLMs), such as the GPT series, have driven significant industrial applications, leading to economic and societal transformations. However, a comprehensive understanding of their real-world applications remains limited. To address this, we introduce REALM, a dataset of over 94,000 LLM use cases collected from Reddit and news articles. REALM captures two key dimensions: the diverse applications of LLMs and the demographics of their users. It categorizes LLM applications and explores how users' occupations relate to the types of applications they use. By integrating real-world data, REALM offers insights into LLM adoption across different domains, providing a foundation for future research on their evolving societal roles.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes