CVLGMLAug 27, 2017

Imbalanced Malware Images Classification: a CNN based Approach

arXiv:1708.08042v2118 citations
AI Analysis

This addresses data imbalance in malware detection for cybersecurity applications, but it is incremental as it modifies an existing loss function.

The paper tackles the problem of degraded performance in malware image classification due to imbalanced class sizes by proposing a weighted softmax loss function, achieving promising results with improved classification performance across typical CNNs.

Deep convolutional neural networks (CNNs) can be applied to malware binary detection via image classification. The performance, however, is degraded due to the imbalance of malware families (classes). To mitigate this issue, we propose a simple yet effective weighted softmax loss which can be employed as the final layer of deep CNNs. The original softmax loss is weighted, and the weight value can be determined according to class size. A scaling parameter is also included in computing the weight. Proper selection of this parameter is studied and an empirical option is suggested. The weighted loss aims at alleviating the impact of data imbalance in an end-to-end learning fashion. To validate the efficacy, we deploy the proposed weighted loss in a pre-trained deep CNN model and fine-tune it to achieve promising results on malware images classification. Extensive experiments also demonstrate that the new loss function can well fit other typical CNNs, yielding an improved classification performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes