CR LGApr 18, 2025

OpCode-Based Malware Classification Using Machine Learning and Deep Learning Techniques

arXiv:2504.13408v14 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This is an incremental study for cybersecurity researchers, applying existing methods to malware classification with OpCode data.

The paper tackled malware classification by comparing traditional machine learning methods (SVM, KNN, Decision Tree) with a deep learning CNN approach using OpCode sequences, finding that SVM outperformed other traditional techniques while the CNN showed competitive performance with automated feature extraction.

This technical report presents a comprehensive analysis of malware classification using OpCode sequences. Two distinct approaches are evaluated: traditional machine learning using n-gram analysis with Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Decision Tree classifiers; and a deep learning approach employing a Convolutional Neural Network (CNN). The traditional machine learning approach establishes a baseline using handcrafted 1-gram and 2-gram features from disassembled malware samples. The deep learning methodology builds upon the work proposed in "Deep Android Malware Detection" by McLaughlin et al. and evaluates the performance of a CNN model trained to automatically extract features from raw OpCode data. Empirical results are compared using standard performance metrics (accuracy, precision, recall, and F1-score). While the SVM classifier outperforms other traditional techniques, the CNN model demonstrates competitive performance with the added benefit of automated feature extraction.

View on arXiv PDF

Similar