CYNov 3, 2023Code
Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language ModelsJake Chanenson, Madison Pickering, Noah Apthorpe
Identifying contextual integrity (CI) and governing knowledge commons (GKC) parameters in privacy policy texts can facilitate normative privacy analysis. However, GKC-CI annotation has heretofore required manual or crowdsourced effort. This paper demonstrates that high-accuracy GKC-CI parameter annotation of privacy policies can be performed automatically using large language models. We fine-tune 50 open-source and proprietary models on 21,588 ground truth GKC-CI annotations from 16 privacy policies. Our best performing model has an accuracy of 90.65%, which is comparable to the accuracy of experts on the same task. We apply our best performing model to 456 privacy policies from a variety of online services, demonstrating the effectiveness of scaling GKC-CI annotation for privacy policy exploration and analysis. We publicly release our model training code, training and testing data, an annotation visualizer, and all annotated policies for future GKC-CI research.
39.3HCApr 7
Learning Password Best Practices Through In-Task InstructionQian Ma, Yingfan Zhou, Shubhang Kaushik et al.
Users often make security- and privacy-relevant decisions without a clear understanding of the rules that govern safe behavior. We introduce pedagogical friction, a design approach that inserts brief, instructional interactions at the moment of action. We evaluate this approach in the context of password creation, a familiar task with clear quality criteria. We conducted a randomized study with 128 participants across four interface conditions that varied the depth and interactivity of guidance. We assessed three outcomes: (1) rule compliance in a subsequent password task without guidance, (2) accuracy on survey questions tied to password rules, and (3) behavior-knowledge alignment, which captures whether participants who correctly followed a rule also recognized it on the survey. Across the guided conditions, participants corrected most rule violations in the follow-up task and showed high behavior-knowledge alignment. Survey results suggested clearer advantages for some rule types, especially symbol related questions. These results position pedagogical friction as a lightweight intervention for security- and privacy-critical interfaces.
CRSep 21, 2019Code
IoT Inspector: Crowdsourcing Labeled Network Traffic from Smart Home Devices at ScaleDanny Yuxing Huang, Noah Apthorpe, Gunes Acar et al.
The proliferation of smart home devices has created new opportunities for empirical research in ubiquitous computing, ranging from security and privacy to personal health. Yet, data from smart home deployments are hard to come by, and existing empirical studies of smart home devices typically involve only a small number of devices in lab settings. To contribute to data-driven smart home research, we crowdsource the largest known dataset of labeled network traffic from smart home devices from within real-world home networks. To do so, we developed and released IoT Inspector, an open-source tool that allows users to observe the traffic from smart home devices on their own home networks. Since April 2019, 4,322 users have installed IoT Inspector, allowing us to collect labeled network traffic from 44,956 smart home devices across 13 categories and 53 vendors. We demonstrate how this data enables new research into smart homes through two case studies focused on security and privacy. First, we find that many device vendors use outdated TLS versions and advertise weak ciphers. Second, we discover about 350 distinct third-party advertiser and tracking domains on smart TVs. We also highlight other research areas, such as network management and healthcare, that can take advantage of IoT Inspector's dataset. To facilitate future reproducible research in smart homes, we will release the IoT Inspector data to the public.
NISep 30, 2021
Automating Internet of Things Network Traffic Collection with Robotic Arm InteractionsXi Jiang, Noah Apthorpe
Consumer Internet of things research often involves collecting network traffic sent or received by IoT devices. These data are typically collected via crowdsourcing or while researchers manually interact with IoT devices in a laboratory setting. However, manual interactions and crowdsourcing are often tedious, expensive, inaccurate, or do not provide comprehensive coverage of possible IoT device behaviors. We present a new method for generating IoT network traffic using a robotic arm to automate user interactions with devices. This eliminates manual button pressing and enables permutation-based interaction sequences that rigorously explore the range of possible device behaviors. We test this approach with an Arduino-controlled robotic arm, a smart speaker, and a smart thermostat, using machine learning to demonstrate that collected network traffic contains information about device interactions that could be useful for network, security, or privacy analyses. We also provide source code and documentation allowing researchers to easily automate IoT device interactions and network traffic collection in future studies.
MAFeb 5, 2021
SkillBot: Identifying Risky Content for Children in Alexa SkillsTu Le, Danny Yuxing Huang, Noah Apthorpe et al.
Many households include children who use voice personal assistants (VPA) such as Amazon Alexa. Children benefit from the rich functionalities of VPAs and third-party apps but are also exposed to new risks in the VPA ecosystem. In this paper, we first investigate "risky" child-directed voice apps that contain inappropriate content or ask for personal information through voice interactions. We build SkillBot - a natural language processing (NLP)-based system to automatically interact with VPA apps and analyze the resulting conversations. We find 28 risky child-directed apps and maintain a growing dataset of 31,966 non-overlapping app behaviors collected from 3,434 Alexa apps. Our findings suggest that although child-directed VPA apps are subject to stricter policy requirements and more intensive vetting, children remain vulnerable to inappropriate content and privacy violations. We then conduct a user study showing that parents are concerned about the identified risky apps. Many parents do not believe that these apps are available and designed for families/kids, although these apps are actually published in Amazon's "Kids" product category. We also find that parents often neglect basic precautions such as enabling parental controls on Alexa devices. Finally, we identify a novel risk in the VPA ecosystem: confounding utterances, or voice commands shared by multiple apps that may cause a user to interact with a different app than intended. We identify 4,487 confounding utterances, including 581 shared by child-directed and non-child-directed apps. We find that 27% of these confounding utterances prioritize invoking a non-child-directed app over a child-directed app. This indicates that children are at real risk of accidentally invoking non-child-directed apps due to confounding utterances.
HCJan 28, 2020
You, Me, and IoT: How Internet-Connected Consumer Devices Affect Interpersonal RelationshipsNoah Apthorpe, Pardis Emami-Naeini, Arunesh Mathur et al.
Internet-connected consumer devices have rapidly increased in popularity; however, relatively little is known about how these technologies are affecting interpersonal relationships in multi-occupant households. In this study, we conduct 13 semi-structured interviews and survey 508 individuals from a variety of backgrounds to discover and categorize how consumer IoT devices are affecting interpersonal relationships in the United States. We highlight several themes, providing exploratory data about the pervasiveness of interpersonal costs and benefits of consumer IoT devices. These results inform follow-up studies and design priorities for future IoT technologies to amplify positive and reduce negative interpersonal effects.
CRDec 3, 2018
Keeping the Smart Home Private with Smart(er) IoT Traffic ShapingNoah Apthorpe, Danny Yuxing Huang, Dillon Reisman et al.
The proliferation of smart home Internet of Things (IoT) devices presents unprecedented challenges for preserving privacy within the home. In this paper, we demonstrate that a passive network observer (e.g., an Internet service provider) can infer private in-home activities by analyzing Internet traffic from commercially available smart home devices even when the devices use end-to-end transport-layer encryption. We evaluate common approaches for defending against these types of traffic analysis attacks, including firewalls, virtual private networks, and independent link padding, and find that none sufficiently conceal user activities with reasonable data overhead. We develop a new defense, "stochastic traffic padding" (STP), that makes it difficult for a passive network adversary to reliably distinguish genuine user activities from generated traffic patterns designed to look like user interactions. Our analysis provides a theoretical bound on an adversary's ability to accurately detect genuine user activities as a function of the amount of additional cover traffic generated by the defense technique.
CRAug 22, 2018
A Developer-Friendly Library for Smart Home IoT Privacy-Preserving Traffic ObfuscationTrisha Datta, Noah Apthorpe, Nick Feamster
The number and variety of Internet-connected devices have grown enormously in the past few years, presenting new challenges to security and privacy. Research has shown that network adversaries can use traffic rate metadata from consumer IoT devices to infer sensitive user activities. Shaping traffic flows to fit distributions independent of user activities can protect privacy, but this approach has seen little adoption due to required developer effort and overhead bandwidth costs. Here, we present a Python library for IoT developers to easily integrate privacy-preserving traffic shaping into their products. The library replaces standard networking functions with versions that automatically obfuscate device traffic patterns through a combination of payload padding, fragmentation, and randomized cover traffic. Our library successfully preserves user privacy and requires approximately 4 KB/s overhead bandwidth for IoT devices with low send rates or high latency tolerances. This overhead is reasonable given normal Internet speeds in American homes and is an improvement on the bandwidth requirements of existing solutions.
CYMay 15, 2018
Discovering Smart Home Internet of Things Privacy Norms Using Contextual IntegrityNoah Apthorpe, Yan Shvartzshnaider, Arunesh Mathur et al.
The proliferation of Internet of Things (IoT) devices for consumer "smart" homes raises concerns about user privacy. We present a survey method based on the Contextual Integrity (CI) privacy framework that can quickly and efficiently discover privacy norms at scale. We apply the method to discover privacy norms in the smart home context, surveying 1,731 American adults on Amazon Mechanical Turk. For $2,800 and in less than six hours, we measured the acceptability of 3,840 information flows representing a combinatorial space of smart home devices sending consumer information to first and third-party recipients under various conditions. Our results provide actionable recommendations for IoT device manufacturers, including design best practices and instructions for adopting our method for further research.
CRMay 7, 2018
Security and Privacy Analyses of Internet of Things Children's ToysGordon Chu, Noah Apthorpe, Nick Feamster
This paper investigates the security and privacy of Internet-connected children's smart toys through case studies of three commercially-available products. We conduct network and application vulnerability analyses of each toy using static and dynamic analysis techniques, including application binary decompilation and network monitoring. We discover several publicly undisclosed vulnerabilities that violate the Children's Online Privacy Protection Rule (COPPA) as well as the toys' individual privacy policies. These vulnerabilities, especially security flaws in network communications with first-party servers, are indicative of a disconnect between many IoT toy developers and security and privacy best practices despite increased attention to Internet-connected toy hacking risks.
CRMay 7, 2018
Detecting Compressed Cleartext Traffic from Consumer Internet of Things DevicesDaniel Hahn, Noah Apthorpe, Nick Feamster
Data encryption is the primary method of protecting the privacy of consumer device Internet communications from network observers. The ability to automatically detect unencrypted data in network traffic is therefore an essential tool for auditing Internet-connected devices. Existing methods identify network packets containing cleartext but cannot differentiate packets containing encrypted data from packets containing compressed unencrypted data, which can be easily recovered by reversing the compression algorithm. This makes it difficult for consumer protection advocates to identify devices that risk user privacy by sending sensitive data in a compressed unencrypted format. Here, we present the first technique to automatically distinguish encrypted from compressed unencrypted network transmissions on a per-packet basis. We apply three machine learning models and achieve a maximum 66.9% accuracy with a convolutional neural network trained on raw packet data. This result is a baseline for this previously unstudied machine learning problem, which we hope will motivate further attention and accuracy improvements. To facilitate continuing research on this topic, we have made our training and test datasets available to the public.
CRApr 11, 2018
Machine Learning DDoS Detection for Consumer Internet of Things DevicesRohan Doshi, Noah Apthorpe, Nick Feamster
An increasing number of Internet of Things (IoT) devices are connecting to the Internet, yet many of these devices are fundamentally insecure, exposing the Internet to a variety of attacks. Botnets such as Mirai have used insecure consumer IoT devices to conduct distributed denial of service (DDoS) attacks on critical Internet infrastructure. This motivates the development of new techniques to automatically detect consumer IoT attack traffic. In this paper, we demonstrate that using IoT-specific network behaviors (e.g. limited number of endpoints and regular time intervals between packets) to inform feature selection can result in high accuracy DDoS detection in IoT network traffic with a variety of machine learning algorithms, including neural networks. These results indicate that home gateway routers or other network middleboxes could automatically detect local IoT device sources of DDoS attacks using low-cost machine learning algorithms and traffic data that is flow-based and protocol-agnostic.
CRMar 27, 2018
Cleartext Data Transmissions in Consumer IoT Medical DevicesDaniel Wood, Noah Apthorpe, Nick Feamster
This paper introduces a method to capture network traffic from medical IoT devices and automatically detect cleartext information that may reveal sensitive medical conditions and behaviors. The research follows a three-step approach involving traffic collection, cleartext detection, and metadata analysis. We analyze four popular consumer medical IoT devices, including one smart medical device that leaks sensitive health information in cleartext. We also present a traffic capture and analysis system that seamlessly integrates with a home network and offers a user-friendly interface for consumers to monitor and visualize data transmissions of IoT devices in their homes.
HCFeb 22, 2018
User Perceptions of Smart Home IoT PrivacySerena Zheng, Noah Apthorpe, Marshini Chetty et al.
Smart home Internet of Things (IoT) devices are rapidly increasing in popularity, with more households including Internet-connected devices that continuously monitor user activities. In this study, we conduct eleven semi-structured interviews with smart home owners, investigating their reasons for purchasing IoT devices, perceptions of smart home privacy risks, and actions taken to protect their privacy from those external to the home who create, manage, track, or regulate IoT devices and/or their data. We note several recurring themes. First, users' desires for convenience and connectedness dictate their privacy-related behaviors for dealing with external entities, such as device manufacturers, Internet Service Providers, governments, and advertisers. Second, user opinions about external entities collecting smart home data depend on perceived benefit from these entities. Third, users trust IoT device manufacturers to protect their privacy but do not verify that these protections are in place. Fourth, users are unaware of privacy risks from inference algorithms operating on data from non-audio/visual devices. These findings motivate several recommendations for device designers, researchers, and industry standards to better match device privacy features to the expectations and preferences of smart home owners.
CRAug 16, 2017
Spying on the Smart Home: Privacy Attacks and Defenses on Encrypted IoT TrafficNoah Apthorpe, Dillon Reisman, Srikanth Sundaresan et al.
The growing market for smart home IoT devices promises new conveniences for consumers while presenting new challenges for preserving privacy within the home. Many smart home devices have always-on sensors that capture users' offline activities in their living spaces and transmit information about these activities on the Internet. In this paper, we demonstrate that an ISP or other network observer can infer privacy sensitive in-home activities by analyzing Internet traffic from smart homes containing commercially-available IoT devices even when the devices use encryption. We evaluate several strategies for mitigating the privacy risks associated with smart home device traffic, including blocking, tunneling, and rate-shaping. Our experiments show that traffic shaping can effectively and practically mitigate many privacy risks associated with smart home IoT devices. We find that 40KB/s extra bandwidth usage is enough to protect user activities from a passive network adversary. This bandwidth cost is well within the Internet speed limits and data caps for many smart homes.
CRMay 18, 2017
Closing the Blinds: Four Strategies for Protecting Smart Home Privacy from Network ObserversNoah Apthorpe, Dillon Reisman, Nick Feamster
The growing market for smart home IoT devices promises new conveniences for consumers while presenting novel challenges for preserving privacy within the home. Specifically, Internet service providers or neighborhood WiFi eavesdroppers can measure Internet traffic rates from smart home devices and infer consumers' private in-home behaviors. Here we propose four strategies that device manufacturers and third parties can take to protect consumers from side-channel traffic rate privacy threats: 1) blocking traffic, 2) concealing DNS, 3) tunneling traffic, and 4) shaping and injecting traffic. We hope that these strategies, and the implementation nuances we discuss, will provide a foundation for the future development of privacy-sensitive smart homes.
CRMay 18, 2017
A Smart Home is No Castle: Privacy Vulnerabilities of Encrypted IoT TrafficNoah Apthorpe, Dillon Reisman, Nick Feamster
The increasing popularity of specialized Internet-connected devices and appliances, dubbed the Internet-of-Things (IoT), promises both new conveniences and new privacy concerns. Unlike traditional web browsers, many IoT devices have always-on sensors that constantly monitor fine-grained details of users' physical environments and influence the devices' network communications. Passive network observers, such as Internet service providers, could potentially analyze IoT network traffic to infer sensitive details about users. Here, we examine four IoT smart home devices (a Sense sleep monitor, a Nest Cam Indoor security camera, a WeMo switch, and an Amazon Echo) and find that their network traffic rates can reveal potentially sensitive user interactions even when the traffic is encrypted. These results indicate that a technological solution is needed to protect IoT device owner privacy, and that IoT-specific concerns must be considered in the ongoing policy debate around ISP data collection and usage.