Multi-Task Hierarchical Learning Based Network Traffic Analytics
This addresses the problem of dataset availability and reproducibility for researchers in network traffic analytics, though it is incremental in combining existing methods with new data.
The authors tackled the lack of reproducible datasets in network traffic analytics by introducing (N et)2database with three open datasets containing nearly 1.3M labeled flows, and they developed a Multi-Task Hierarchical Learning (MTHL) model that accurately performs multiple tasks with a dramatic reduction in training time.
Classifying network traffic is the basis for important network applications. Prior research in this area has faced challenges on the availability of representative datasets, and many of the results cannot be readily reproduced. Such a problem is exacerbated by emerging data-driven machine learning based approaches. To address this issue, we present(N et)2databasewith three open datasets containing nearly 1.3M labeled flows in total, with a comprehensive list of flow features, for there search community1. We focus on broad aspects in network traffic analysis, including both malware detection and application classification. As we continue to grow them, we expect the datasets to serve as a common ground for AI driven, reproducible research on network flow analytics. We release the datasets publicly and also introduce a Multi-Task Hierarchical Learning (MTHL)model to perform all tasks in a single model. Our results show that MTHL is capable of accurately performing multiple tasks with hierarchical labeling with a dramatic reduction in training time.