17.4CYMay 27
Local Privacy Laws in a Globalized WorldShantanu Sharma, Ethan Myers, Lorenzo De Carli et al.
Personal data has emerged as a highly valuable yet sensitive asset that drives business decisions, enables targeted advertising, and generates substantial revenue for companies, while simultaneously facilitating invasive monitoring of users. In recent years, research on digital privacy violations, including undue access, collection, and sharing of user data, has grown significantly. Much of this research adopts the European General Data Protection Regulation (GDPR) as the primary reference framework. This is reasonable, as GDPR was a pioneering legislation, and many of its stipulations are clear and unambiguous. However, we argue that focusing solely on GDPR (and a small set of other Western regulatory frameworks) ignores privacy-related concerns, attitudes, and problems faced by users from other locales, creating a significant research blind spot. This work systematically normalizes the heterogeneous legal requirements of multiple data protection laws into a unified abstraction aligned with the data lifecycle, which forms the foundation for the implementation of such regulations. We further investigate the implications of these laws on different stakeholders, including users, organizations, and governments. Overall, this work aims to broaden the digital privacy research community's perspective and to serve as a set of guiding principles for developing technological privacy solutions spanning multiple countries.
CRJan 26, 2023
Minerva: A File-Based Ransomware DetectorDorjan Hitaj, Giulio Pagnotta, Fabio De Gaspari et al.
Ransomware attacks have caused billions of dollars in damages in recent years, and are expected to cause billions more in the future. Consequently, significant effort has been devoted to ransomware detection and mitigation. Behavioral-based ransomware detection approaches have garnered considerable attention recently. These behavioral detectors typically rely on process-based behavioral profiles to identify malicious behaviors. However, with an increasing body of literature highlighting the vulnerability of such approaches to evasion attacks, a comprehensive solution to the ransomware problem remains elusive. This paper presents Minerva, a novel, robust approach to ransomware detection. Minerva is engineered to be robust by design against evasion attacks, with architectural and feature selection choices informed by their resilience to adversarial manipulation. We conduct a comprehensive analysis of Minerva across a diverse spectrum of ransomware types, encompassing unseen ransomware as well as variants designed specifically to evade Minerva. Our evaluation showcases the ability of Minerva to accurately identify ransomware, generalize to unseen threats, and withstand evasion attacks. Furthermore, over 99% of detected ransomware are identified within 0.52sec of activity, enabling the adoption of data loss prevention techniques with near-zero overhead.
CRMar 31, 2021
Reliable Detection of Compressed and Encrypted DataFabio De Gaspari, Dorjan Hitaj, Giulio Pagnotta et al.
Several cybersecurity domains, such as ransomware detection, forensics and data analysis, require methods to reliably identify encrypted data fragments. Typically, current approaches employ statistics derived from byte-level distribution, such as entropy estimation, to identify encrypted fragments. However, modern content types use compression techniques which alter data distribution pushing it closer to the uniform distribution. The result is that current approaches exhibit unreliable encryption detection performance when compressed data appears in the dataset. Furthermore, proposed approaches are typically evaluated over few data types and fragment sizes, making it hard to assess their practical applicability. This paper compares existing statistical tests on a large, standardized dataset and shows that current approaches consistently fail to distinguish encrypted and compressed data on both small and large fragment sizes. We address these shortcomings and design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data. We evaluate EnCoD on a dataset of 16 different file types and fragment sizes ranging from 512B to 8KB. Our results highlight that EnCoD outperforms current approaches by a wide margin, with accuracy ranging from ~82 for 512B fragments up to ~92 for 8KB data fragments. Moreover, EnCoD can pinpoint the exact format of a given data fragment, rather than performing only binary classification like previous approaches.
CROct 15, 2020
EnCoD: Distinguishing Compressed and Encrypted File FragmentsFabio De Gaspari, Dorjan Hitaj, Giulio Pagnotta et al.
Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency. Compression algorithms also output high-entropy data, thus reducing the accuracy of entropy-based encryption detectors. Over the years, a variety of approaches have been proposed to distinguish encrypted file fragments from high-entropy compressed fragments. However, these approaches are typically only evaluated over a few, select data types and fragment sizes, which makes a fair assessment of their practical applicability impossible. This paper aims to close this gap by comparing existing statistical tests on a large, standardized dataset. Our results show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes. To address this issue, we design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes. We evaluate EnCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.
SEMar 6, 2020
SpellBound: Defending Against Package TyposquattingMatthew Taylor, Ruturaj K. Vaidya, Drew Davidson et al.
Package managers for software repositories based on a single programming language are very common. Examples include npm (JavaScript), and PyPI (Python). These tools encourage code reuse, making it trivial for developers to import external packages. Unfortunately, repositories' size and the ease with which packages can be published facilitates the practice of typosquatting: the uploading of a package with name similar to that of a highly popular package, typically with the aim of capturing some of the popular package's installs. Typosquatting has serious negative implications, resulting in developers importing malicious packages, or -- as we show -- code clones which do not incorporate recent security updates. In order to tackle this problem, we present SpellBound, a tool for identifying and reporting potentially erroneous imports to developers. SpellBound implements a novel typosquatting detection technique, based on an in-depth analysis of npm and PyPI. Our technique leverages a model of lexical similarity between names, and further incorporates the notion of package popularity. This approach flags cases where unknown/scarcely used packages would be installed in place of popular ones with similar names, before installation occurs. We evaluated SpellBound on both npm and PyPI, with encouraging results: SpellBound flags typosquatting cases while generating limited warnings (0.5% of total package installs), and low overhead (only 2.5% of package install time). Furthermore, SpellBound allowed us to confirm known cases of typosquatting and discover one high-profile, unknown case of typosquatting that resulted in a package takedown by the npm security team.
CRNov 6, 2019
The Naked Sun: Malicious Cooperation Between Benign-Looking ProcessesFabio De Gaspari, Dorjan Hitaj, Giulio Pagnotta et al.
Recent progress in machine learning has generated promising results in behavioral malware detection. Behavioral modeling identifies malicious processes via features derived by their runtime behavior. Behavioral features hold great promise as they are intrinsically related to the functioning of each malware, and are therefore considered difficult to evade. Indeed, while a significant amount of results exists on evasion of static malware features, evasion of dynamic features has seen limited work. This paper thoroughly examines the robustness of behavioral malware detectors to evasion, focusing particularly on anti-ransomware evasion. We choose ransomware as its behavior tends to differ significantly from that of benign processes, making it a low-hanging fruit for behavioral detection (and a difficult candidate for evasion). Our analysis identifies a set of novel attacks that distribute the overall malware workload across a small set of cooperating processes to avoid the generation of significant behavioral features. Our most effective attack decreases the accuracy of a state-of-the-art classifier from 98.6% to 0% using only 18 cooperating processes. Furthermore, we show our attacks to be effective against commercial ransomware detectors even in a black-box setting.
CRMar 6, 2019
Security Issues in Language-based Software EcosystemsRuturaj K. Vaidya, Lorenzo De Carli, Drew Davidson et al.
Language-based ecosystems (LBE), i.e., software ecosystems based on a single programming language, are very common. Examples include the npm ecosystem for JavaScript, and PyPI for Python. These environments encourage code reuse between packages, and incorporate utilities - package managers - for automatically resolving dependencies. However, the same aspects that make these systems popular - ease of publishing code and importing external code - also create novel security issues, which have so far seen little study. We present an a systematic study of security issues that plague LBEs. These issues are inherent to the ways these ecosystems work and cannot be resolved by fixing software vulnerabilities in either the packages or the utilities, e.g., package manager tools, that build these ecosystems. We systematically characterize recent security attacks from various aspects, including attack strategies, vectors, and goals. Our characterization and in-depth analysis of npm and PyPI ecosystems, which represent the largest LBEs, covering nearly one million packages indicates that these ecosystems make an opportune environment for attackers to incorporate stealthy attacks. Overall, we argue that (i) fully automated detection of malicious packages is likely to be unfeasible; however (ii) tools and metrics that help developers assess the risk of including external dependencies would go a long way toward preventing attacks.
CRFeb 26, 2016
Towards Least Privilege Containers with CimplifierVaibhav Rastogi, Drew Davidson, Lorenzo De Carli et al.
Application containers, such as Docker containers, have recently gained popularity as a solution for agile and seamless deployment of applications. These light-weight virtualization environments run applications that are packed together with their resources and configuration information, and thus can be deployed across various software platforms. However, these software ecosystems are not conducive to the true and tried security principles of privilege separation (PS) and principle of least privilege (PLP). We propose algorithms and a tool Cimplifier, which address these concerns in the context of containers. Specifically, given a container our tool partitions them into simpler containers, which are only provided enough resources to perform their functionality. As part our solution, we develop techniques for analyzing resource usage, for performing partitioning, and gluing the containers together to preserve functionality. Our evaluation on real-world containers demonstrates that Cimplifier can preserve the original functionality, leads to reduction in image size of 58-95%, and processes even large containers in under thirty seconds.