CRAICLHCNov 21, 2024

Learned, Lagged, LLM-splained: LLM Responses to End User Security Questions

arXiv:2411.14571v21 citationsh-index: 3ACSAC
Originality Synthesis-oriented
AI Analysis

This addresses the problem of unreliable LLM responses for end users seeking security advice, highlighting incremental improvements needed.

The study evaluated three popular LLMs on 900 end user security questions, finding that while they show broad knowledge, they exhibit errors like stale answers and indirect communication, impacting information quality.

Answering end user security questions is challenging. While large language models (LLMs) like GPT, LLAMA, and Gemini are far from error-free, they have shown promise in answering a variety of questions outside of security. We studied LLM performance in the area of end user security by qualitatively evaluating 3 popular LLMs on 900 systematically collected end user security questions. While LLMs demonstrate broad generalist ``knowledge'' of end user security information, there are patterns of errors and limitations across LLMs consisting of stale and inaccurate answers, and indirect or unresponsive communication styles, all of which impacts the quality of information received. Based on these patterns, we suggest directions for model improvement and recommend user strategies for interacting with LLMs when seeking assistance with security.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes