Korey MacVittie

h-index2

5papers

16citations

Novelty10%

AI Score12

Ranked #193,925 of 194,257 authors (top 100%)#40,153 in LG (top 100%)

5 Papers

1.6LGNov 3, 2021

A Survey of Machine Learning Algorithms for Detecting Malware in IoT Firmware

Erik Larsen, Korey MacVittie, John Lilly

This work explores the use of machine learning techniques on an Internet-of-Things firmware dataset to detect malicious attempts to infect edge devices or subsequently corrupt an entire network. Firmware updates are uncommon in IoT devices; hence, they abound with vulnerabilities. Attacks against such devices can go unnoticed, and users can become a weak point in security. Malware can cause DDoS attacks and even spy on sensitive areas like peoples' homes. To help mitigate this threat, this paper employs a number of machine learning algorithms to classify IoT firmware and the best performing models are reported. In a general comparison, the top three algorithms are Gradient Boosting, Logistic Regression, and Random Forest classifiers. Deep learning approaches including Convolutional and Fully Connected Neural Networks with both experimental and proven successful architectures are also explored.

1.6LGNov 3, 2021

Intrusion Detection: Machine Learning Baseline Calculations for Image Classification

Erik Larsen, Korey MacVittie, John Lilly

Cyber security can be enhanced through application of machine learning by recasting network attack data into an image format, then applying supervised computer vision and other machine learning techniques to detect malicious specimens. Exploratory data analysis reveals little correlation and few distinguishing characteristics between the ten classes of malware used in this study. A general model comparison demonstrates that the most promising candidates for consideration are Light Gradient Boosting Machine, Random Forest Classifier, and Extra Trees Classifier. Convolutional networks fail to deliver their outstanding classification ability, being surpassed by a simple, fully connected architecture. Most tests fail to break 80% categorical accuracy and present low F1 scores, indicating more sophisticated approaches (e.g., bootstrapping, random samples, and feature selection) may be required to maximize performance.

1.6LGNov 3, 2021

Virus-MNIST: Machine Learning Baseline Calculations for Image Classification

Erik Larsen, Korey MacVittie, John Lilly

The Virus-MNIST data set is a collection of thumbnail images that is similar in style to the ubiquitous MNIST hand-written digits. These, however, are cast by reshaping possible malware code into an image array. Naturally, it is poised to take on a role in benchmarking progress of virus classifier model training. Ten types are present: nine classified as malware and one benign. Cursory examination reveals unequal class populations and other key aspects that must be considered when selecting classification and pre-processing methods. Exploratory analyses show possible identifiable characteristics from aggregate metrics (e.g., the pixel median values), and ways to reduce the number of features by identifying strong correlations. A model comparison shows that Light Gradient Boosting Machine, Gradient Boosting Classifier, and Random Forest algorithms produced the highest accuracy scores, thus showing promise for deeper scrutiny.

3.1LGOct 14, 2021

A Survey of Machine Learning Algorithms for Detecting Ransomware Encryption Activity

Erik Larsen, David Noever, Korey MacVittie

A survey of machine learning techniques trained to detect ransomware is presented. This work builds upon the efforts of Taylor et al. in using sensor-based methods that utilize data collected from built-in instruments like CPU power and temperature monitors to identify encryption activity. Exploratory data analysis (EDA) shows the features most useful from this simulated data are clock speed, temperature, and CPU load. These features are used in training multiple algorithms to determine an optimal detection approach. Performance is evaluated with accuracy, F1 score, and false-negative rate metrics. The Multilayer Perceptron with three hidden layers achieves scores of 97% in accuracy and F1 and robust data preparation. A random forest model produces scores of 93% accuracy and 92% F1, showing that sensor-based detection is currently a viable option to detect even zero-day ransomware attacks before the code fully executes.

1.4CVJul 1, 2021

Overhead-MNIST: Machine Learning Baselines for Image Classification

Erik Larsen, David Noever, Korey MacVittie et al.

Twenty-three machine learning algorithms were trained then scored to establish baseline comparison metrics and to select an image classification algorithm worthy of embedding into mission-critical satellite imaging systems. The Overhead-MNIST dataset is a collection of satellite images similar in style to the ubiquitous MNIST hand-written digits found in the machine learning literature. The CatBoost classifier, Light Gradient Boosting Machine, and Extreme Gradient Boosting models produced the highest accuracies, Areas Under the Curve (AUC), and F1 scores in a PyCaret general comparison. Separate evaluations showed that a deep convolutional architecture was the most promising. We present results for the overall best performing algorithm as a baseline for edge deployability and future performance improvement: a convolutional neural network (CNN) scoring 0.965 categorical accuracy on unseen test data.