CLMar 26, 2024

For those who don't know (how) to ask: Building a dataset of technology questions for digital newcomers

arXiv:2403.18125v13 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This addresses the issue of digital inclusion for marginalized users, but it is incremental as it focuses on dataset creation rather than a new method or solution.

The paper tackles the problem of digital newcomers struggling to ask appropriate technology questions due to lexical or conceptual barriers, and proposes creating a dataset from a decade of tutoring data to study how unclear queries affect LLM outputs.

While the rise of large language models (LLMs) has created rich new opportunities to learn about digital technology, many on the margins of this technology struggle to gain and maintain competency due to lexical or conceptual barriers that prevent them from asking appropriate questions. Although there have been many efforts to understand factuality of LLM-created content and ability of LLMs to answer questions, it is not well understood how unclear or nonstandard language queries affect the model outputs. We propose the creation of a dataset that captures questions of digital newcomers and outsiders, utilizing data we have compiled from a decade's worth of one-on-one tutoring. In this paper we lay out our planned efforts and some potential uses of this dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes