Hassan Habibi Gharakheili

h-index27

14papers

3,816citations

Novelty39%

AI Score43

Ranked #55,421 of 194,257 authors (top 29%)#1,263 in CR (top 19%)

14 Papers

2.3CRApr 11, 2023Code

Detecting Anomalous Microflows in IoT Volumetric Attacks via Dynamic Monitoring of MUD Activity

Ayyoob Hamza, Hassan Habibi Gharakheili, Theophilus A. Benson et al.

IoT networks are increasingly becoming target of sophisticated new cyber-attacks. Anomaly-based detection methods are promising in finding new attacks, but there are certain practical challenges like false-positive alarms, hard to explain, and difficult to scale cost-effectively. The IETF recent standard called Manufacturer Usage Description (MUD) seems promising to limit the attack surface on IoT devices by formally specifying their intended network behavior. In this paper, we use SDN to enforce and monitor the expected behaviors of each IoT device, and train one-class classifier models to detect volumetric attacks. Our specific contributions are fourfold. (1) We develop a multi-level inferencing model to dynamically detect anomalous patterns in network activity of MUD-compliant traffic flows via SDN telemetry, followed by packet inspection of anomalous flows. This provides enhanced fine-grained visibility into distributed and direct attacks, allowing us to precisely isolate volumetric attacks with microflow (5-tuple) resolution. (2) We collect traffic traces (benign and a variety of volumetric attacks) from network behavior of IoT devices in our lab, generate labeled datasets, and make them available to the public. (3) We prototype a full working system (modules are released as open-source), demonstrates its efficacy in detecting volumetric attacks on several consumer IoT devices with high accuracy while maintaining low false positives, and provides insights into cost and performance of our system. (4) We demonstrate how our models scale in environments with a large number of connected IoTs (with datasets collected from a network of IP cameras in our university campus) by considering various training strategies (per device unit versus per device type), and balancing the accuracy of prediction against the cost of models in terms of size and training time.

1.8LGMar 18, 2022

AdIoTack: Quantifying and Refining Resilience of Decision Tree Ensemble Inference Models against Adversarial Volumetric Attacks on IoT Networks

Arman Pashamokhtari, Gustavo Batista, Hassan Habibi Gharakheili

Machine Learning-based techniques have shown success in cyber intelligence. However, they are increasingly becoming targets of sophisticated data-driven adversarial attacks resulting in misprediction, eroding their ability to detect threats on network devices. In this paper, we present AdIoTack, a system that highlights vulnerabilities of decision trees against adversarial attacks, helping cybersecurity teams quantify and refine the resilience of their trained models for monitoring IoT networks. To assess the model for the worst-case scenario, AdIoTack performs white-box adversarial learning to launch successful volumetric attacks that decision tree ensemble models cannot flag. Our first contribution is to develop a white-box algorithm that takes a trained decision tree ensemble model and the profile of an intended network-based attack on a victim class as inputs. It then automatically generates recipes that specify certain packets on top of the indented attack packets (less than 15% overhead) that together can bypass the inference model unnoticed. We ensure that the generated attack instances are feasible for launching on IP networks and effective in their volumetric impact. Our second contribution develops a method to monitor the network behavior of connected devices actively, inject adversarial traffic (when feasible) on behalf of a victim IoT device, and successfully launch the intended attack. Our third contribution prototypes AdIoTack and validates its efficacy on a testbed consisting of a handful of real IoT devices monitored by a trained inference model. We demonstrate how the model detects all non-adversarial volumetric attacks on IoT devices while missing many adversarial ones. The fourth contribution develops systematic methods for applying patches to trained decision tree ensemble models, improving their resilience against adversarial volumetric attacks.

2.0LGJan 17, 2023

Quantifying and Managing Impacts of Concept Drifts on IoT Traffic Inference in Residential ISP Networks

Arman Pashamokhtari, Norihiro Okui, Masataka Nakahara et al.

Millions of vulnerable consumer IoT devices in home networks are the enabler for cyber crimes putting user privacy and Internet security at risk. Internet service providers (ISPs) are best poised to play key roles in mitigating risks by automatically inferring active IoT devices per household and notifying users of vulnerable ones. Developing a scalable inference method that can perform robustly across thousands of home networks is a non-trivial task. This paper focuses on the challenges of developing and applying data-driven inference models when labeled data of device behaviors is limited and the distribution of data changes (concept drift) across time and space domains. Our contributions are three-fold: (1) We collect and analyze network traffic of 24 types of consumer IoT devices from 12 real homes over six weeks to highlight the challenge of temporal and spatial concept drifts in network behavior of IoT devices; (2) We analyze the performance of two inference strategies, namely "global inference" (a model trained on a combined set of all labeled data from training homes) and "contextualized inference" (several models each trained on the labeled data from a training home) in the presence of concept drifts; and (3) To manage concept drifts, we develop a method that dynamically applies the ``closest'' model (from a set) to network traffic of unseen homes during the testing phase, yielding better performance in 20% of scenarios.

6.0NIApr 21

Assessing Resilience in Authoritative DNS Infrastructure Supporting Government Services

Agung Septiadi, Minzhao Lyu, Hassan Habibi Gharakheili et al.

Online government services are increasingly regarded as critical national infrastructure. Because these services directly influence public trust, any disruption can have significant societal and political consequences. Yet their supporting infrastructures remain vulnerable to outages from natural disasters, geopolitical tensions, and targeted attacks. Central to their operation is the authoritative Domain Name System (DNS) infrastructure, the single source of truth that maps government domain names to service endpoints. While indispensable, this infrastructure also represents a potential and critical point of system failure. In this paper, we introduce a comprehensive assessment framework with purpose-designed mechanisms to systematically evaluate the operational resilience of authoritative DNS infrastructure supporting government services. Complementing prior studies on website hosting, recursive resolution, and DNS record integrity, our work provides a holistic view of authoritative DNS operation. Our first contribution develops a multi-sourced data schema that characterizes a (government) domain's authoritative DNS infrastructure across four hierarchical levels: physical hosting infrastructure, server functionality, name servers, and individual hosting instances. Using data collected from six representative countries, our second contribution identifies resilience attributes at their finest applicable hierarchy across three operational phases: infrastructure placement, service configuration, and DNS record dispatch. Our method assigns numerical scores to each attribute and aggregates them algorithmically to enable consistent and cross-domain comparisons. We apply our method to government domains in the six countries, highlighting their strengths and weaknesses in authoritative DNS resilience and pinpointing operational practices that require improvement.

2.7LGJan 27

Generalizable IoT Traffic Representations for Cross-Network Device Identification

Arunan Sivanathan, David Warren, Deepak Mishra et al.

Machine learning models have demonstrated strong performance in classifying network traffic and identifying Internet-of-Things (IoT) devices, enabling operators to discover and manage IoT assets at scale. However, many existing approaches rely on end-to-end supervised pipelines or task-specific fine-tuning, resulting in traffic representations that are tightly coupled to labeled datasets and deployment environments, which can limit generalizability. In this paper, we study the problem of learning generalizable traffic representations for IoT device identification. We design compact encoder architectures that learn per-flow embeddings from unlabeled IoT traffic and evaluate them using a frozen-encoder protocol with a simple supervised classifier. Our specific contributions are threefold. (1) We develop unsupervised encoder--decoder models that learn compact traffic representations from unlabeled IoT network flows and assess their quality through reconstruction-based analysis. (2) We show that these learned representations can be used effectively for IoT device-type classification using simple, lightweight classifiers trained on frozen embeddings. (3) We provide a systematic benchmarking study against the state-of-the-art pretrained traffic encoders, showing that larger models do not necessarily yield more robust representations for IoT traffic. Using more than 18 million real IoT traffic flows collected across multiple years and deployment environments, we learn traffic representations from unlabeled data and evaluate device-type classification on disjoint labeled subsets, achieving macro F1-scores exceeding 0.9 for device-type classification and demonstrating robustness under cross-environment deployment.

12.4CRApr 12, 2018Code

Clear as MUD: Generating, Validating and Applying IoT Behaviorial Profiles (Technical Report)

Ayyoob Hamza, Dinesha Ranathunga, H. Habibi Gharakheili et al.

IoT devices are increasingly being implicated in cyber-attacks, driving community concern about the risks they pose to critical infrastructure, corporations, and citizens. In order to reduce this risk, the IETF is pushing IoT vendors to develop formal specifications of the intended purpose of their IoT devices, in the form of a Manufacturer Usage Description (MUD), so that their network behavior in any operating environment can be locked down and verified rigorously. This paper aims to assist IoT manufacturers in developing and verifying MUD profiles, while also helping adopters of these devices to ensure they are compatible with their organizational policies. Our first contribution is to develop a tool that takes the traffic trace of an arbitrary IoT device as input and automatically generates a MUD profile for it. We contribute our tool as open source, apply it to 28 consumer IoT devices, and highlight insights and challenges encountered in the process. Our second contribution is to apply a formal semantic framework that not only validates a given MUD profile for consistency, but also checks its compatibility with a given organizational policy. Finally, we apply our framework to representative organizations and selected devices, to demonstrate how MUD can reduce the effort needed for IoT acceptance testing.

10.0CRJun 11

Semantic Identification of IoT Devices from Behavioral Primitives

Samuel Witt, Hassan Habibi Gharakheili

Accurate identification of IoT devices is important for security management and policy enforcement. Existing approaches typically learn device signatures from packets or flow records. These methods operate on low-level communication observations whose traffic patterns may vary across deployments, software versions, and user interactions. This paper studies device identification using Manufacturer Usage Description (MUD) profiles. MUD profiles describe device behavior using Access Control Entries (ACEs), where each ACE represents a behavioral primitive consisting of protocol, endpoint, direction, and port semantics derived from device communication policy. Our contributions are threefold. First, using 28 publicly available MUD profiles containing 1,023 ACE instances, we construct ACE-level semantic representations from compact behavioral text and analyze their geometric properties. ACE-level representations preserve device-level behavioral distinctions more effectively than whole-profile embeddings and remain effective after whitening calibration. Second, we evaluate semantic ACE matching under controlled runtime variations, including unseen ACEs, drifted hostnames, and partial runtime observation. Exact ACE matching performs well when the overlap with the canonical MUD profile remains high, but degrades sharply when the overlap becomes sparse or disappears. In contrast, semantic ACE matching preserves useful identification evidence across these conditions. Third, we evaluate the same approach on real IoT traffic traces comprising more than 800,000 observed flows. Exact overlap remains the strongest signal when stable overlap exists, while semantic ACE matching provides stronger identification evidence during the early stages of observation, frequently retains the correct device among the highest-ranked candidates, and remains effective under sparse-overlap runtime traffic.

2.3NIJan 18, 2022

Analyzing Enterprise DNS Traffic to Classify Assets and Track Cyber-Health

Minzhao Lyu, Hassan Habibi Gharakheili, Craig Russell et al.

The Domain Name System (DNS) is a critical service that enables domain names to be converted to IP addresses (or vice versa); consequently, it is generally permitted through enterprise security systems (e.g., firewalls) with little restriction. This has exposed organizational networks to DDoS, exfiltration, and reflection attacks, inflicting significant financial and reputational damage. Large organizations with loosely federated IT departments (e.g., Universities and Research Institutes) often do not even fully aware of all their DNS assets and vulnerabilities, let alone the attack surface they expose to the outside world. In this paper, we address the "DNS blind spot" by developing methods to passively analyze live DNS traffic, identify organizational DNS assets, and monitor their health on a continuous basis. Our contributions are threefold. First, we perform a comprehensive analysis of all DNS traffic in two large organizations (a University Campus and a Government Research Institute) for over a month, and identify key behavioral profiles for various asset types such as recursive resolvers, authoritative name servers, and mixed DNS servers. Second, we develop an unsupervised clustering method that classifies enterprise DNS assets using the behavioral attributes identified, and demonstrate that our method successfully classifies over 100 DNS assets across the two organizations. Third, our method continuously tracks various health metrics across the organizational DNS assets and identifies several instances of improper configuration, data exfiltration, DDoS, and reflection attacks. We believe the passive analysis methods in this paper can help enterprises monitor organizational DNS health in an automated and risk-free manner.

7.0CRJan 3, 2022

A Survey on DNS Encryption: Current Development, Malware Misuse, and Inference Techniques

Minzhao Lyu, Hassan Habibi Gharakheili, Vijay Sivaraman

The domain name system (DNS) that maps alphabetic names to numeric Internet Protocol (IP) addresses plays a foundational role for Internet communications. By default, DNS queries and responses are exchanged in unencrypted plaintext, and hence, can be read and/or hijacked by third parties. To protect user privacy, the networking community has proposed standard encryption technologies such as DNS over TLS (DoT), DNS over HTTPS (DoH), and DNS over QUIC (DoQ) for DNS communications, enabling clients to perform secure and private domain name lookups. We survey the DNS encryption literature published since 2016, focusing on its current landscape and how it is misused by malware, and highlighting the existing techniques developed to make inferences from encrypted DNS traffic. First, we provide an overview of various standards developed in the space of DNS encryption and their adoption status, performance, benefits, and security issues. Second, we highlight ways that various malware families can exploit DNS encryption to their advantage for botnet communications and/or data exfiltration. Third, we discuss existing inference methods for profiling normal patterns and/or detecting malicious encrypted DNS traffic. Several directions are presented to motivate future research in enhancing the performance and security of DNS encryption.

1.2NIDec 5, 2021

Modeling Live Video Streaming: Real-Time Classification, QoE Inference, and Field Evaluation

Sharat Chandra Madanapalli, Alex Mathai, Hassan Habibi Gharakheili et al.

Social media, professional sports, and video games are driving rapid growth in live video streaming, on platforms such as Twitch and YouTube Live. Live streaming experience is very susceptible to short-time-scale network congestion since client playback buffers are often no more than a few seconds. Unfortunately, identifying such streams and measuring their QoE for network management is challenging, since content providers largely use the same delivery infrastructure for live and video-on-demand (VoD) streaming, and packet inspection techniques (including SNI/DNS query monitoring) cannot always distinguish between the two. In this paper, we design, build, and deploy ReCLive: a machine learning method for live video detection and QoE measurement based on network-level behavioral characteristics. Our contributions are four-fold: (1) We analyze about 23,000 video streams from Twitch and YouTube, and identify key features in their traffic profile that differentiate live and on-demand streaming. We release our traffic traces as open data to the public; (2) We develop an LSTM-based binary classifier model that distinguishes live from on-demand streams in real-time with over 95% accuracy across providers; (3) We develop a method that estimates QoE metrics of live streaming flows in terms of resolution and buffer stall events with overall accuracies of 93% and 90%, respectively; and (4) Finally, we prototype our solution, train it in the lab, and deploy it in a live ISP network serving more than 7,000 subscribers. Our method provides ISPs with fine-grained visibility into live video streams, enabling them to measure and improve user experience.

1.2SPApr 19, 2021

Modeling Classroom Occupancy using Data of WiFi Infrastructure in a University Campus

Iresha Pasquel Mohottige, Hassan Habibi Gharakheili, Vijay Sivaraman et al.

Universities worldwide are experiencing a surge in enrollments, therefore campus estate managers are seeking continuous data on attendance patterns to optimize the usage of classroom space. As a result, there is an increasing trend to measure classrooms attendance by employing various sensing technologies, among which pervasive WiFi infrastructure is seen as a low cost method. In a dense campus environment, the number of connected WiFi users does not well estimate room occupancy since connection counts are polluted by adjoining rooms, outdoor walkways, and network load balancing. In this paper, we develop machine learning based models to infer classroom occupancy from WiFi sensing infrastructure. Our contributions are three-fold: (1) We analyze metadata from a dense and dynamic wireless network comprising of thousands of access points (APs) to draw insights into coverage of APs, behavior of WiFi connected users, and challenges of estimating room occupancy; (2) We propose a method to automatically map APs to classrooms using unsupervised clustering algorithms; and (3) We model classroom occupancy using a combination of classification and regression methods of varying algorithms. We achieve 84.6% accuracy in mapping APs to classrooms while the accuracy of our estimation for room occupancy is comparable to beam counter sensors with a symmetric Mean Absolute Percentage Error (sMAPE) of 13.10%.

2.9CRAug 21, 2020

IoT Network Security: Requirements, Threats, and Countermeasures

Ayyoob Hamza, Hassan Habibi Gharakheili, Vijay Sivaraman

IoT devices are increasingly utilized in critical infrastructure, enterprises, and households. There are several sophisticated cyber-attacks that have been reported and many networks have proven vulnerable to both active and passive attacks by leaking private information, allowing unauthorized access, and being open to denial of service attacks. This paper aims firstly, to assist network operators to understand the need for an IoT network security solution, and then secondly, to survey IoT network attack vectors, cyber threats, and countermeasures with a focus on improving the robustness of existing security solutions. Our first contribution highlights viewpoints on IoT security from the perspective of stakeholders such as manufacturers, service providers, consumers, and authorities. We discuss the differences between IoT and IT systems, the need for IoT security solutions, and we highlight the key components required for IoT network security system architecture. For our second contribution, we survey the types of IoT attacks by grouping them based on their impact. We discuss various attack techniques, threats, and shortfalls of existing countermeasures with an intention to enable future research into improving IoT network security.

5.2CRJul 7, 2020

Optimal Witnessing of Healthcare IoT Data Using Blockchain Logging Contract

Mohammad Hossein Chinaei, Hassan Habibi Gharakheili, Vijay Sivaraman

Verification of data generated by wearable sensors is increasingly becoming of concern to health service providers and insurance companies. There is a need for a verification framework that various authorities can request a verification service for the local network data of a target IoT device. In this paper, we leverage blockchain as a distributed platform to realize an on-demand verification scheme. This allows authorities to automatically transact with connected devices for witnessing services. A public request is made for witness statements on the data of a target IoT that is transmitted on its local network, and subsequently, devices (in close vicinity of the target IoT) offer witnessing service. Our contributions are threefold: (1) We develop a system architecture based on blockchain and smart contract that enables authorities to dynamically avail a verification service for data of a subject device from a distributed set of witnesses which are willing to provide (in a privacy-preserving manner) their local wireless measurement in exchange of monetary return; (2) We then develop a method to optimally select witnesses in such a way that the verification error is minimized subject to monetary cost constraints; (3) Lastly, we evaluate the efficacy of our scheme using real Wi-Fi session traces collected from a five-storeyed building with more than thirty access points, representative of a hospital. According to the current pricing schedule of the Ethereum public blockchain, our scheme enables healthcare authorities to verify data transmitted from a typical wearable device with the verification error of the order 0.01% at cost of less than two dollars for one-hour witnessing service.

1.2CYMay 28, 2020

HazeDose: Design and Analysis of a Personal Air Pollution Inhaled Dose Estimation System using Wearable Sensors

Ke Hu, Ashfaqur Rahman, Hassan Habibi Gharakheili et al.

Nowadays air pollution becomes one of the biggest world issues in both developing and developed countries. Helping individuals understand their air pollution exposure and health risks, the traditional way is to utilize data from static monitoring stations and estimate air pollution qualities in a large area by government agencies. Data from such sensing system is very sparse and cannot reflect real personal exposure. In recent years, several research groups have developed participatory air pollution sensing systems which use wearable or portable units coupled with smartphones to crowd-source urban air pollution data. These systems have shown remarkable improvement in spatial granularity over government-operated fixed monitoring systems. In this paper, we extend the paradigm to HazeDose system, which can personalize the individuals' air pollution exposure. Specifically, we combine the pollution concentrations obtained from an air pollution estimation system with the activity data from the individual's on-body activity monitors to estimate the personal inhalation dosage of air pollution. Users can visualize their personalized air pollution exposure information via a mobile application. We show that different activities, such as walking, cycling, or driving, impact their dosage, and commuting patterns contribute to a significant proportion of an individual's daily air pollution dosage. Moreover, we propose a dosage minimization algorithm, with the trial results showing that up to 14.1% of a biker's daily exposure can be reduced while using alternative routes the driver can inhale 25.9% less than usual. One heuristic algorithm is also introduced to balance the execution time and dosage reduction for alternative routes scenarios. The results show that up to 20.3% dosage reduction can be achieved when the execution time is almost one seventieth of the original one.