CRCLLGOct 24, 2025

SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots

arXiv:2510.21459v11 citationsh-index: 12025 3rd International Conference on Foundation and Large Language Models (FLLM)
Originality Incremental advance
AI Analysis

This work addresses the need for more effective and secure honeypots in cybersecurity, though it is incremental in comparing RAG vs. prompt-tuning methods.

The paper tackled the problem of improving context-awareness in honeypots to maximize attacker engagement by proposing the SBASH framework, which uses lightweight local LLMs to address data-protection issues and shows that RAG improves accuracy for untuned models, while prompt-tuned models achieve similar accuracy without RAG but with slightly lower latency.

Honeypots are decoy systems used for gathering valuable threat intelligence or diverting attackers away from production systems. Maximising attacker engagement is essential to their utility. However research has highlighted that context-awareness, such as the ability to respond to new attack types, systems and attacker agents, is necessary to increase engagement. Large Language Models (LLMs) have been shown as one approach to increase context awareness but suffer from several challenges including accuracy and timeliness of response time, high operational costs and data-protection issues due to cloud deployment. We propose the System-Based Attention Shell Honeypot (SBASH) framework which manages data-protection issues through the use of lightweight local LLMs. We investigate the use of Retrieval Augmented Generation (RAG) supported LLMs and non-RAG LLMs for Linux shell commands and evaluate them using several different metrics such as response time differences, realism from human testers, and similarity to a real system calculated with Levenshtein distance, SBert, and BertScore. We show that RAG improves accuracy for untuned models while models that have been tuned via a system prompt that tells the LLM to respond like a Linux system achieve without RAG a similar accuracy as untuned with RAG, while having a slightly lower latency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes