CRAug 6, 2020
New Directions in Automated Traffic AnalysisJordan Holland, Paul Schmitt, Nick Feamster et al.
Despite the use of machine learning for many network traffic analysis tasks in security, from application identification to intrusion detection, the aspects of the machine learning pipeline that ultimately determine the performance of the model -- feature selection and representation, model selection, and parameter tuning -- remain manual and painstaking. This paper presents a method to automate many aspects of traffic analysis, making it easier to apply machine learning techniques to a wider variety of traffic analysis tasks. We introduce nPrint, a tool that generates a unified packet representation that is amenable for representation learning and model training. We integrate nPrint with automated machine learning (AutoML), resulting in nPrintML, a public system that largely eliminates feature extraction and model tuning for a wide variety of traffic analysis tasks. We have evaluated nPrintML on eight separate traffic analysis tasks and released nPrint and nPrintML to enable future work to extend these methods.
CRJul 23, 2020
Evaluating Snowflake as an Indistinguishable Censorship Circumvention ToolKyle MacMillan, Jordan Holland, Prateek Mittal
Tor is the most well-known tool for circumventing censorship. Unfortunately, Tor traffic has been shown to be detectable using deep-packet inspection. WebRTC is a popular web frame-work that enables browser-to-browser connections. Snowflake is a novel pluggable transport that leverages WebRTC to connect Tor clients to the Tor network. In theory, Snowflake was created to be indistinguishable from other WebRTC services. In this paper, we evaluate the indistinguishability of Snowflake. We collect over 6,500 DTLS handshakes from Snowflake, Facebook Messenger, Google Hangouts, and Discord WebRTC connections and show that Snowflake is identifiable among these applications with 100% accuracy. We show that several features, including the extensions offered and the number of packets in the handshake, distinguish Snowflake among these services. Finally, we suggest recommendations for improving identification resistance in Snowflake. We have made the dataset publicly available.
NIJun 23, 2020
Classifying Network Vendors at Internet ScaleJordan Holland, Ross Teixeira, Paul Schmitt et al.
In this paper, we develop a method to create a large, labeled dataset of visible network device vendors across the Internet by mapping network-visible IP addresses to device vendors. We use Internet-wide scanning, banner grabs of network-visible devices across the IPv4 address space, and clustering techniques to assign labels to more than 160,000 devices. We subsequently probe these devices and use features extracted from the responses to train a classifier that can accurately classify device vendors. Finally, we demonstrate how this method can be used to understand broader trends across the Internet by predicting device vendors in traceroutes from CAIDA's Archipelago measurement system and subsequently examining vendor distributions across these traceroutes.
NIJul 18, 2019
Comparing the Effects of DNS, DoT, and DoH on Web PerformanceAustin Hounsel, Kevin Borgolte, Paul Schmitt et al.
Nearly every service on the Internet relies on the Domain Name System (DNS), which translates a human-readable name to an IP address before two endpoints can communicate. Today, DNS traffic is unencrypted, leaving users vulnerable to eavesdropping and tampering. Past work has demonstrated that DNS queries can reveal a user's browsing history and even what smart devices they are using at home. In response to these privacy concerns, two new protocols have been proposed: DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT). Instead of sending DNS queries and responses in the clear, DoH and DoT establish encrypted connections between users and resolvers. By doing so, these protocols provide privacy and security guarantees that traditional DNS (Do53) lacks. In this paper, we measure the effect of Do53, DoT, and DoH on query response times and page load times from five global vantage points. We find that although DoH and DoT response times are generally higher than Do53, both protocols can perform better than Do53 in terms of page load times. However, as throughput decreases and substantial packet loss and latency are introduced, web pages load fastest with Do53. Additionally, web pages successfully load more often with Do53 and DoT than DoH. Based on these results, we provide several recommendations to improve DNS performance, such as opportunistic partial responses and wire format caching.
NIApr 20, 2019
Measuring Irregular Geographic Exposure on the InternetJordan Holland, Jared Smith, Max Schuchard
We examine the extent of needless traffic exposure by the routing infrastructure to nations geographically irrelevant to packet transmission. We quantify what countries are geographically logical to observe on a network path traveling between two nations through the use of convex hulls circumscribing major population centers. We then compare that to the nation states observed in over 2.5 billion measured paths. We examine both the entire geographic topology of the Internet and a subset of the topology that a Tor user would typically interact with. We find that 44% of paths across the entire geographic topology of the Internet and 33% of paths in the user experience subset unnecessarily expose traffic to one or more nations. Finally, we consider the scenario where countries exercise both legal and physical control over autonomous systems, gaining access to traffic outside of their geographic borders, but carried by organizations that fall under the AS's registered country's legal jurisdiction. At least 49% of paths in both measurements expose traffic to a geographically irrelevant country when considering both the physical and legal countries that a path traverses.