AINov 9, 2025

Secu-Table: a Comprehensive security table dataset for evaluating semantic table interpretation systems

arXiv:2511.06301v1h-index: 5Has Code
Originality Synthesis-oriented
AI Analysis

This provides a domain-specific dataset for evaluating STI systems, particularly LLM-based ones, in cybersecurity, though it is incremental as it addresses a data gap rather than a methodological breakthrough.

The authors tackled the lack of publicly available tabular datasets for evaluating semantic table interpretation systems in the security domain by introducing Secu-Table, a dataset with over 1500 tables and 15k entities constructed from CVE and CWE sources, annotated using Wikidata and SEPSES CSKG, and released with all code publicly.

Evaluating semantic tables interpretation (STI) systems, (particularly, those based on Large Language Models- LLMs) especially in domain-specific contexts such as the security domain, depends heavily on the dataset. However, in the security domain, tabular datasets for state-of-the-art are not publicly available. In this paper, we introduce Secu-Table dataset, composed of more than 1500 tables with more than 15k entities constructed using security data extracted from Common Vulnerabilities and Exposures (CVE) and Common Weakness Enumeration (CWE) data sources and annotated using Wikidata and the SEmantic Processing of Security Event Streams CyberSecurity Knowledge Graph (SEPSES CSKG). Along with the dataset, all the code is publicly released. This dataset is made available to the research community in the context of the SemTab challenge on Tabular to Knowledge Graph Matching. This challenge aims to evaluate the performance of several STI based on open source LLMs. Preliminary evaluation, serving as baseline, was conducted using Falcon3-7b-instruct and Mistral-7B-Instruct, two open source LLMs and GPT-4o mini one closed source LLM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes