Predicting Process Name from Network Data
This provides a tool for cyber defense by enabling application identification from network data, but it is incremental as it applies existing machine learning methods to a specific domain.
The paper tackled the problem of identifying applications from network traffic using netflow-like features, achieving high classification accuracy for tasks like browser vs. non-browser identification and process name prediction.
The ability to identify applications based on the network data they generate could be a valuable tool for cyber defense. We report on a machine learning technique capable of using netflow-like features to predict the application that generated the traffic. In our experiments, we used ground-truth labels obtained from host-based sensors deployed in a large enterprise environment; we applied random forests and multilayer perceptrons to the tasks of browser vs. non-browser identification, browser fingerprinting, and process name prediction. For each of these tasks, we demonstrate how machine learning models can achieve high classification accuracy using only netflow-like features as the basis for classification.