CL CR LGDec 12, 2023

LLMs Perform Poorly at Concept Extraction in Cyber-security Research Literature

Maxime Würsch, Andrei Kucharavy, Dimitri Percia David, Alain Mermoud

arXiv:2312.07110v12.19 citationsh-index: 9

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of tracking fast-evolving cybersecurity trends for researchers and organizations, though it is incremental as it builds on existing noun extraction methods.

The paper tackled the problem of extracting relevant knowledge entities from cybersecurity research literature using large language models (LLMs), finding that LLMs performed poorly for this task but noun extractors showed potential, leading to a developed model that achieved promising results for trend monitoring.

The cybersecurity landscape evolves rapidly and poses threats to organizations. To enhance resilience, one needs to track the latest developments and trends in the domain. It has been demonstrated that standard bibliometrics approaches show their limits in such a fast-evolving domain. For this purpose, we use large language models (LLMs) to extract relevant knowledge entities from cybersecurity-related texts. We use a subset of arXiv preprints on cybersecurity as our data and compare different LLMs in terms of entity recognition (ER) and relevance. The results suggest that LLMs do not produce good knowledge entities that reflect the cybersecurity context, but our results show some potential for noun extractors. For this reason, we developed a noun extractor boosted with some statistical analysis to extract specific and relevant compound nouns from the domain. Later, we tested our model to identify trends in the LLM domain. We observe some limitations, but it offers promising results to monitor the evolution of emergent trends.

View on arXiv PDF

Similar