New Directions in Automated Traffic Analysis
This work addresses the problem of labor-intensive machine learning processes in network security for researchers and practitioners, representing an incremental improvement by integrating existing AutoML with a new packet representation tool.
The paper tackles the manual and painstaking aspects of machine learning pipelines in network traffic analysis by introducing nPrintML, a system that automates feature extraction and model tuning, evaluated on eight tasks to facilitate broader application of machine learning techniques.
Despite the use of machine learning for many network traffic analysis tasks in security, from application identification to intrusion detection, the aspects of the machine learning pipeline that ultimately determine the performance of the model -- feature selection and representation, model selection, and parameter tuning -- remain manual and painstaking. This paper presents a method to automate many aspects of traffic analysis, making it easier to apply machine learning techniques to a wider variety of traffic analysis tasks. We introduce nPrint, a tool that generates a unified packet representation that is amenable for representation learning and model training. We integrate nPrint with automated machine learning (AutoML), resulting in nPrintML, a public system that largely eliminates feature extraction and model tuning for a wide variety of traffic analysis tasks. We have evaluated nPrintML on eight separate traffic analysis tasks and released nPrint and nPrintML to enable future work to extend these methods.