Junmei Ding

2papers

2 Papers

CLAug 13, 2024Code
CTISum: A New Benchmark Dataset For Cyber Threat Intelligence Summarization

Wei Peng, Junmei Ding, Wei Wang et al.

Cyber Threat Intelligence (CTI) summarization involves generating concise and accurate highlights from web intelligence data, which is critical for providing decision-makers with actionable insights to swiftly detect and respond to cyber threats in the cybersecurity domain. Despite that, the development of efficient techniques for summarizing CTI reports, comprising facts, analytical insights, attack processes, and more, has been hindered by the lack of suitable datasets. To address this gap, we introduce CTISum, a new benchmark dataset designed for the CTI summarization task. Recognizing the significance of understanding attack processes, we also propose a novel fine-grained subtask: attack process summarization, which aims to help defenders assess risks, identify security gaps, and uncover vulnerabilities. Specifically, a multi-stage annotation pipeline is designed to collect and annotate CTI data from diverse web sources, alongside a comprehensive benchmarking of CTISum using both extractive, abstractive and LLMs-based summarization methods. Experimental results reveal that current state-of-the-art models face significant challenges when applied to CTISum, highlighting that automatic summarization of CTI reports remains an open research problem. The code and example dataset can be made publicly available at https://github.com/pengwei-iie/CTISum.

NIApr 16, 2022
A Hierarchical Terminal Recognition Approach based on Network Traffic Analysis

Lingzi Kong, Daoqi Han, Junmei Ding et al.

Recognizing the type of connected devices to a network helps to perform security policies. In smart grids, identifying massive number of grid metering terminals based on network traffic analysis is almost blank and existing research has not proposed a targeted end-to-end model to solve the flow classification problem. Therefore, we proposed a hierarchical terminal recognition approach that applies the details of grid data. We have formed a two-level model structure by segmenting the grid data, which uses the statistical characteristics of network traffic and the specific behavior characteristics of grid metering terminals. Moreover, through the selection and reconstruction of features, we combine three algorithms to achieve accurate identification of terminal types that transmit network traffic. We conduct extensive experiments on a real dataset containing three types of grid metering terminals, and the results show that our research has improved performance compared to common recognition models. The combination of an autoencoder, K-Means and GradientBoost algorithm achieved the best recognition rate with F1 value of 98.3%.