CR AI LGSep 12, 2021

DRo: A data-scarce mechanism to revolutionize the performance of Deep Learning based Security Systems

Mohit Sewak, Sanjay K. Sahay, Hemant Rathore

arXiv:2109.05470v13.8

Originality Incremental advance

AI Analysis

This addresses the challenge of sparse-labeled data for security systems, offering significant performance gains in a domain-specific context.

The paper tackles the problem of supervised deep learning requiring large labeled datasets by proposing DRo, a mechanism for data-scarce domains like security, which reduces false alarms by 67.9% and boosts accuracy by 11.3% in malware detection using low-information features.

Supervised Deep Learning requires plenty of labeled data to converge, and hence perform optimally for task-specific learning. Therefore, we propose a novel mechanism named DRo (for Deep Routing) for data-scarce domains like security. The DRo approach builds upon some of the recent developments in Deep-Clustering. In particular, it exploits the self-augmented training mechanism using synthetically generated local perturbations. DRo not only allays the challenges with sparse-labeled data but also offers many unique advantages. We also developed a system named DRoID that uses the DRo mechanism for enhancing the performance of an existing Malware Detection System that uses (low information features like the) Android implicit Intent(s) as the only features. We conduct experiments on DRoID using a popular and standardized Android malware dataset and found that the DRo mechanism could successfully reduce the false-alarms generated by the downstream classifier by 67.9%, and also simultaneously boosts its accuracy by 11.3%. This is significant not only because the gains achieved are unparalleled but also because the features used were never considered rich enough to train a classifier on; and hence no decent performance could ever be reported by any malware classification system till-date using these features in isolation. Owing to the results achieved, the DRo mechanism claims a dominant position amongst all known systems that aims to enhance the classification performance of deep learning models with sparse-labeled data.

View on arXiv PDF

Similar