SEAug 29, 2018Code
Use of Source Code Similarity Metrics in Software Defect PredictionAhmet Okutan
In recent years, defect prediction has received a great deal of attention in the empirical software engineering world. Predicting software defects before the maintenance phase is very important not only to decrease the maintenance costs but also increase the overall quality of a software product. There are different types of product, process, and developer based software metrics proposed so far to measure the defectiveness of a software system. This paper suggests to use a novel set of software metrics which are based on the similarities detected among the source code files in a software project. To find source code similarities among different files of a software system, plagiarism and clone detection techniques are used. Two simple similarity metrics are calculated for each file, considering its overall similarity to the defective and non defective files in the project. Using these similarity metrics, we predict whether a specific file is defective or not. Our experiments on 10 open source data sets show that depending on the amount of detected similarity, proposed metrics could achieve significantly better performance compared to the existing static code metrics in terms of the area under the curve (AUC).
SEDec 2, 2021
A Grounded Theory Based Approach to Characterize Software Attack SurfacesSara Moshtari, Ahmet Okutan, Mehdi Mirakhorli
The notion of Attack Surface refers to the critical points on the boundary of a software system which are accessible from outside or contain valuable content for attackers. The ability to identify attack surface components of software system has a significant role in effectiveness of vulnerability analysis approaches. Most prior works focus on vulnerability techniques that use an approximation of attack surfaces and there has not been many attempt to create a comprehensive list of attack surface components. Although limited number of studies have focused on attack surface analysis, they defined attack surface components based on project specific hypotheses to evaluate security risk of specific types of software applications. In this study, we leverage a qualitative analysis approach to empirically identify an extensive list of attack surface components. To this end, we conduct a Grounded Theory (GT) analysis on 1444 previously published vulnerability reports and weaknesses with a team of three software developers and security experts. We extract vulnerability information from two publicly available repositories: 1) Common Vulnerabilities and Exposures, and 2) Common Weakness Enumeration. We ask three key questions: where the attacks come from, what they target, and how they emerge, and to help answer these questions we define three core categories for attack surface components: Entry points, Targets, and Mechanisms. We extract attack surface concepts related to each category from collected vulnerability information using the GT analysis and provide a comprehensive categorization that represents attack surface components of software systems from various perspectives. The comparison of the proposed attack surface model with the literature shows in the best case previous works cover only 50% of the attack surface components at network level and only 6.7% of the components at code level.
CRMar 25, 2021
Near Real-time Learning and Extraction of Attack Models from Intrusion AlertsShanchieh Jay Yang, Ahmet Okutan, Gordon Werner et al.
Critical and sophisticated cyberattacks often take multitudes of reconnaissance, exploitations, and obfuscation techniques to penetrate through well protected enterprise networks. The discovery and detection of attacks, though needing continuous efforts, is no longer sufficient. Security Operation Center (SOC) analysts are overwhelmed by the significant volume of intrusion alerts without being able to extract actionable intelligence. Recognizing this challenge, this paper describes the advances and findings through deploying ASSERT to process intrusion alerts from OmniSOC in collaboration with the Center for Applied Cybersecurity Research (CACR) at Indiana University. ASSERT utilizes information theoretic unsupervised learning to extract and update `attack models' in near real-time without expert knowledge. It consumes streaming intrusion alerts and generates a small number of statistical models for SOC analysts to comprehend ongoing and emerging attacks in a timely manner. This paper presents the architecture and key processes of ASSERT and discusses a few real-world attack models to highlight the use-cases that benefit SOC operations. The research team is developing a light-weight containerized ASSERT that will be shared through a public repository to help the community combat the overwhelming intrusion alerts.
CRMar 26, 2018
Forecasting Cyber Attacks with Imbalanced Data Sets and Different Time GranularitiesAhmet Okutan, Shanchieh Jay Yang, Katie McConky
If cyber incidents are predicted a reasonable amount of time before they occur, defensive actions to prevent their destructive effects could be planned. Unfortunately, most of the time we do not have enough observables of the malicious activities before they are already under way. Therefore, this work suggests to use unconventional signals extracted from various data sources with different time granularities to predict cyber incidents for target entities. A Bayesian network is used to predict cyber attacks where the unconventional signals are used as indicative random variables. This work also develops a novel minority class over sampling technique to improve cyber attack prediction on imbalanced data sets. The results show that depending on the selected time granularity, the unconventional signals are able to predict cyber attacks for the anonimyzed target organization even though the signals are not explicitly related to that organization. Furthermore, the minority over sampling approach developed achieves better performance compared to the existing filtering techniques in the literature.