CYAug 7, 2024
Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI AssistantsBeatriz Borges, Negar Foroutan, Deniz Bayazit et al.
AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes. We conceptualize these challenges through the lens of vulnerability, the potential for university assessments and learning outcomes to be impacted by student use of generative AI. We investigate the potential scale of this vulnerability by measuring the degree to which AI assistants can complete assessment questions in standard university-level STEM courses. Specifically, we compile a novel dataset of textual assessment questions from 50 courses at EPFL and evaluate whether two AI assistants, GPT-3.5 and GPT-4 can adequately answer these questions. We use eight prompting strategies to produce responses and find that GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions. When grouping courses in our dataset by degree program, these systems already pass non-project assessments of large numbers of core courses in various degree programs, posing risks to higher education accreditation that will be amplified as these models improve. Our results call for revising program-level assessment design in higher education in light of advances in generative AI.
CRFeb 12, 2016Code
Control-Flow Integrity: Precision, Security, and PerformanceNathan Burow, Scott A. Carr, Joseph Nash et al.
Memory corruption errors in C/C++ programs remain the most common source of security vulnerabilities in today's systems. Control-flow hijacking attacks exploit memory corruption vulnerabilities to divert program execution away from the intended control flow. Researchers have spent more than a decade studying and refining defenses based on Control-Flow Integrity (CFI), and this technique is now integrated into several production compilers. However, so far no study has systematically compared the various proposed CFI mechanisms, nor is there any protocol on how to compare such mechanisms. We compare a broad range of CFI mechanisms using a unified nomenclature based on (i) a qualitative discussion of the conceptual security guarantees, (ii) a quantitative security evaluation, and (iii) an empirical evaluation of their performance in the same test environment. For each mechanism, we evaluate (i) protected types of control-flow transfers, (ii) the precision of the protection for forward and backward edges. For open-source compiler-based implementations, we additionally evaluate (iii) the generated equivalence classes and target sets, and (iv) the runtime performance.
CRFeb 8, 2022
PACSan: Enforcing Memory Safety Based on ARM PAYuan Li, Wende Tan, Zhizheng Lv et al.
Memory safety is a key security property that stops memory corruption vulnerabilities. Existing sanitizers enforce checks and catch such bugs during development and testing. However, they either provide partial memory safety or have overwhelmingly high performance overheads. Our novel sanitizer PACSan enforces spatial and temporal memory safety with no false positives at low performance overheads. PACSan removes the majority of the overheads involved in pointer tracking by sealing metadata in pointers through ARM PA (Pointer Authentication), and performing the memory safety checks when pointers are dereferenced. We have developed a prototype of PACSan and systematically evaluated its security and performance on the Magma, Juliet, Nginx, and SPEC CPU2017 test suites, respectively. In our evaluation, PACSan shows no false positives together with negligible false negatives, while introducing stronger security guarantees and lower performance overheads than state-of-the-art sanitizers, including HWASan, ASan, SoftBound+CETS, Memcheck, LowFat, and PTAuth. Specifically, PACSan has 0.84x runtime overhead and 1.92x memory overhead on average. Compared to the widely deployed ASan, PACSan has no false positives and much fewer false negatives and reduces 7.172% runtime overheads and 89.063%memory overheads.
CRSep 24, 2020
BLURtooth: Exploiting Cross-Transport Key Derivation in Bluetooth Classic and Bluetooth Low EnergyDaniele Antonioli, Nils Ole Tippenhauer, Kasper Rasmussen et al.
The Bluetooth standard specifies two transports: Bluetooth Classic (BT) for high-throughput wireless services and Bluetooth Low Energy (BLE) for very low-power scenarios. BT and BLE have dedicated pairing protocols and devices have to pair over BT and BLE to use both securely. In 2014, the Bluetooth standard (v4.2) addressed this usability issue by introducing Cross-Transport Key Derivation (CTKD). CTKD allows establishing BT and BLE pairing keys just by pairing over one of the two transports. While CTKD crosses the security boundary between BT and BLE, little is known about the internals of CTKD and its security implications. In this work, we present the first complete description of CTKD obtained by merging the scattered information from the Bluetooth standard with the results from our reverse-engineering experiments. Then, we perform a security evaluation of CTKD and uncover four cross-transport issues in its specification. We leverage these issues to design four standard-compliant attacks on CTKD enabling new ways to exploit Bluetooth (e.g., exploiting BT and BLE by targeting only one of the two). Our attacks work even if the strongest security mechanism for BT and BLE are in place, including Numeric Comparison and Secure Connections. They allow to impersonate, man-in-the-middle, and establish unintended sessions with arbitrary devices. We refer to our attacks as BLUR attacks, as they blur the security boundary between BT and BLE. We provide a low-cost implementation of the BLUR attacks and we successfully evaluate them on 14 devices with 16 unique Bluetooth chips from popular vendors. We discuss the attacks' root causes and present effective countermeasures to fix them. We disclosed our findings and countermeasures to the Bluetooth SIG in May 2020 (CVE-2020-15802), and we reported additional unmitigated issues in May 2021.
CRSep 2, 2020
Magma: A Ground-Truth Fuzzing BenchmarkAhmad Hazimeh, Adrian Herrera, Mathias Payer
High scalability and low running costs have made fuzz testing the de facto standard for discovering software bugs. Fuzzing techniques are constantly being improved in a race to build the ultimate bug-finding tool. However, while fuzzing excels at finding bugs in the wild, evaluating and comparing fuzzer performance is challenging due to the lack of metrics and benchmarks. For example, crash count, perhaps the most commonly-used performance metric, is inaccurate due to imperfections in deduplication techniques. Additionally, the lack of a unified set of targets results in ad hoc evaluations that hinder fair comparison. We tackle these problems by developing Magma, a ground-truth fuzzing benchmark that enables uniform fuzzer evaluation and comparison. By introducing real bugs into real software, Magma allows for the realistic evaluation of fuzzers against a broad set of targets. By instrumenting these bugs, Magma also enables the collection of bug-centric performance metrics independent of the fuzzer. Magma is an open benchmark consisting of seven targets that perform a variety of input manipulations and complex computations, presenting a challenge to state-of-the-art fuzzers. We evaluate seven widely-used mutation-based fuzzers (AFL, AFLFast, AFL++, FairFuzz, MOpt-AFL, honggfuzz, and SymCC-AFL) against Magma over 200,000 CPU-hours. Based on the number of bugs reached, triggered, and detected, we draw conclusions about the fuzzers' exploration and detection capabilities. This provides insight into fuzzer performance evaluation, highlighting the importance of ground truth in performing more accurate and meaningful evaluations.
CRMay 25, 2020
Decentralized Privacy-Preserving Proximity TracingCarmela Troncoso, Mathias Payer, Jean-Pierre Hubaux et al.
This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale. This system, referred to as DP3T, provides a technological foundation to help slow the spread of SARS-CoV-2 by simplifying and accelerating the process of notifying people who might have been exposed to the virus so that they can take appropriate measures to break its transmission chain. The system aims to minimise privacy and security risks for individuals and communities and guarantee the highest level of data protection. The goal of our proximity tracing system is to determine who has been in close physical proximity to a COVID-19 positive person and thus exposed to the virus, without revealing the contact's identity or where the contact occurred. To achieve this goal, users run a smartphone app that continually broadcasts an ephemeral, pseudo-random ID representing the user's phone and also records the pseudo-random IDs observed from smartphones in close proximity. When a patient is diagnosed with COVID-19, she can upload pseudo-random IDs previously broadcast from her phone to a central server. Prior to the upload, all data remains exclusively on the user's phone. Other users' apps can use data from the server to locally estimate whether the device's owner was exposed to the virus through close-range physical proximity to a COVID-19 positive person who has uploaded their data. In case the app detects a high risk, it will inform the user.
CRNov 21, 2019
Too Quiet in the Library: An Empirical Study of Security Updates in Android Apps' Native CodeSumaya Almanee, Arda Unal, Mathias Payer et al.
Android apps include third-party native libraries to increase performance and to reuse functionality. Native code is directly executed from apps through the Java Native Interface or the Android Native Development Kit. Android developers add precompiled native libraries to their projects, enabling their use. Unfortunately, developers often struggle or simply neglect to update these libraries in a timely manner. This results in the continuous use of outdated native libraries with unpatched security vulnerabilities years after patches became available. To further understand such phenomena, we study the security updates in native libraries in the most popular 200 free apps on Google Play from Sept. 2013 to May 2020. A core difficulty we face in this study is the identification of libraries and their versions. Developers often rename or modify libraries, making their identification challenging. We create an approach called LibRARIAN (LibRAry veRsion IdentificAtioN) that accurately identifies native libraries and their versions as found in Android apps based on our novel similarity metric bin2sim. LibRARIAN leverages different features extracted from libraries based on their metadata and identifying strings in read-only sections. We discovered 53/200 popular apps (26.5%) with vulnerable versions with known CVEs between Sept. 2013 and May 2020, with 14 of those apps remaining vulnerable. We find that app developers took, on average, 528.71 days to apply security patches, while library developers release a security patch after 54.59 days - a 10 times slower rate of update.
CRJun 7, 2019
Software Ethology: An Accurate, Resilient, and Cross-Architecture Binary Analysis FrameworkDerrick McKee, Nathan Burow, Mathias Payer
When reverse engineering a binary, the analyst must first understand the semantics of the binary's functions through either manual or automatic analysis. Manual semantic analysis is time-consuming, because abstractions provided by high level languages, such as type information, variable scope, or comments are lost, and past analyses cannot apply to the current analysis task. Existing automated binary analysis tools currently suffer from low accuracy in determining semantic function identification in the presence of diverse compilation environments. We introduce Software Ethology, a binary analysis approach for determining the semantic similarity of functions. Software Ethology abstracts semantic behavior as classification vectors of program state changes resulting from a function executing with a specified input state, and uses these vectors as a unique fingerprint for identification. All existing semantic identifiers determine function similarity via code measurements, and suffer from high inaccuracy when classifying functions from compilation environments different from their ground truth source. Since Software Ethology does not rely on code measurements, its accuracy is resilient to changes in compiler, compiler version, optimization level, or even different source implementing equivalent functionality. Tinbergen, our prototype Software Ethology implementation, leverages a virtual execution environment and a fuzzer to generate the classification vectors. In evaluating Tinbergen's feasibility as a semantic function identifier by identifying functions in coreutils-8.30, we achieve a high .805 average accuracy. Compared to the state-of-the-art, Tinbergen is 1.5 orders of magnitude faster when training, 50% faster in answering queries, and, when identifying functions in binaries generated from differing compilation environments, is 30%-61% more accurate.
CRMar 5, 2019
SMoTherSpectre: exploiting speculative execution through port contentionAtri Bhattacharyya, Alexandra Sandulescu, Matthias Neugschwandtner et al.
Spectre, Meltdown, and related attacks have demonstrated that kernels, hypervisors, trusted execution environments, and browsers are prone to information disclosure through micro-architectural weaknesses. However, it remains unclear as to what extent other applications, in particular those that do not load attacker-provided code, may be impacted. It also remains unclear as to what extent these attacks are reliant on cache-based side channels. We introduce SMoTherSpectre, a speculative code-reuse attack that leverages port-contention in simultaneously multi-threaded processors (SMoTher) as a side channel to leak information from a victim process. SMoTher is a fine-grained side channel that detects contention based on a single victim instruction. To discover real-world gadgets, we describe a methodology and build a tool that locates SMoTher-gadgets in popular libraries. In an evaluation on glibc, we found hundreds of gadgets that can be used to leak information. Finally, we demonstrate proof-of-concept attacks against the OpenSSH server, creating oracles for determining four host key bits, and against an application performing encryption using the OpenSSL library, creating an oracle which can differentiate a bit of the plaintext through gadgets in libcrypto and glibc.
CRNov 7, 2018
Shining Light On Shadow StacksNathan Burow, Xinping Zhang, Mathias Payer
Control-Flow Hijacking attacks are the dominant attack vector against C/C++ programs. Control-Flow Integrity (CFI) solutions mitigate these attacks on the forward edge,i.e., indirect calls through function pointers and virtual calls. Protecting the backward edge is left to stack canaries, which are easily bypassed through information leaks. Shadow Stacks are a fully precise mechanism for protecting backwards edges, and should be deployed with CFI mitigations. We present a comprehensive analysis of all possible shadow stack mechanisms along three axes: performance, compatibility, and security. For performance comparisons we use SPEC CPU2006, while security and compatibility are qualitatively analyzed. Based on our study, we renew calls for a shadow stack design that leverages a dedicated register, resulting in low performance overhead, and minimal memory overhead, but sacrifices compatibility. We present case studies of our implementation of such a design, Shadesmar, on Phoronix and Apache to demonstrate the feasibility of dedicating a general purpose register to a security monitor on modern architectures, and the deployability of Shadesmar. Our comprehensive analysis, including detailed case studies for our novel design, allows compiler designers and practitioners to select the correct shadow stack design for different usage scenarios.
CRMay 12, 2018
Block Oriented Programming: Automating Data-Only AttacksKyriakos Ispoglou, Bader AlBassam, Trent Jaeger et al.
With the widespread deployment of Control-Flow Integrity (CFI), control-flow hijacking attacks, and consequently code reuse attacks, are significantly more difficult. CFI limits control flow to well-known locations, severely restricting arbitrary code execution. Assessing the remaining attack surface of an application under advanced control-flow hijack defenses such as CFI and shadow stacks remains an open problem. We introduce BOPC, a mechanism to automatically assess whether an attacker can execute arbitrary code on a binary hardened with CFI/shadow stack defenses. BOPC computes exploits for a target program from payload specifications written in a Turing-complete, high-level language called SPL that abstracts away architecture and program-specific details. SPL payloads are compiled into a program trace that executes the desired behavior on top of the target binary. The input for BOPC is an SPL payload, a starting point (e.g., from a fuzzer crash) and an arbitrary memory write primitive that allows application state corruption. To map SPL payloads to a program trace, BOPC introduces Block Oriented Programming (BOP), a new code reuse technique that utilizes entire basic blocks as gadgets along valid execution paths in the program, i.e., without violating CFI or shadow stack policies. We find that the problem of mapping payloads to program traces is NP-hard, so BOPC first reduces the search space by pruning infeasible paths and then uses heuristics to guide the search to probable paths. BOPC encodes the BOP payload as a set of memory writes. We execute 13 SPL payloads applied to 10 popular applications. BOPC successfully finds payloads and complex execution traces -- which would likely not have been found through manual analysis -- while following the target's Control-Flow Graph under an ideal CFI policy in 81% of the cases.
CRApr 17, 2017
CUP: Comprehensive User-Space Protection for C/C++Nathan Burow, Derrick McKee, Scott A. Carr et al.
Memory corruption vulnerabilities in C/C++ applications enable attackers to execute code, change data, and leak information. Current memory sanitizers do no provide comprehensive coverage of a program's data. In particular, existing tools focus primarily on heap allocations with limited support for stack allocations and globals. Additionally, existing tools focus on the main executable with limited support for system libraries. Further, they suffer from both false positives and false negatives. We present Comprehensive User-Space Protection for C/C++, CUP, an LLVM sanitizer that provides complete spatial and probabilistic temporal memory safety for C/C++ program on 64-bit architectures (with a prototype implementation for x86_64). CUP uses a hybrid metadata scheme that supports all program data including globals, heap, or stack and maintains the ABI. Compared to existing approaches with the NIST Juliet test suite, CUP reduces false negatives by 10x (0.1%) compared to the state of the art LLVM sanitizers, and produces no false positives. CUP instruments all user-space code, including libc and other system libraries, removing them from the trusted code base.
CRJun 7, 2015
Forgery-Resistant Touch-based Authentication on Mobile DevicesNeil Zhenqiang Gong, Mathias Payer, Reza Moazzezi et al.
Mobile devices store a diverse set of private user data and have gradually become a hub to control users' other personal Internet-of-Things devices. Access control on mobile devices is therefore highly important. The widely accepted solution is to protect access by asking for a password. However, password authentication is tedious, e.g., a user needs to input a password every time she wants to use the device. Moreover, existing biometrics such as face, fingerprint, and touch behaviors are vulnerable to forgery attacks. We propose a new touch-based biometric authentication system that is passive and secure against forgery attacks. In our touch-based authentication, a user's touch behaviors are a function of some random "secret". The user can subconsciously know the secret while touching the device's screen. However, an attacker cannot know the secret at the time of attack, which makes it challenging to perform forgery attacks even if the attacker has already obtained the user's touch behaviors. We evaluate our touch-based authentication system by collecting data from 25 subjects. Results are promising: the random secrets do not influence user experience and, for targeted forgery attacks, our system achieves 0.18 smaller Equal Error Rates (EERs) than previous touch-based authentication.
CRSep 27, 2014
Similarity-based matching meets Malware DiversityMathias Payer, Stephen Crane, Per Larsen et al.
Similarity metrics, e.g., signatures as used by anti-virus products, are the dominant technique to detect if a given binary is malware. The underlying assumption of this approach is that all instances of a malware (or even malware family) will be similar to each other. Software diversification is a probabilistic technique that uses code and data randomization and expressiveness in the target instruction set to generate large amounts of functionally equivalent but different binaries. Malware diversity builds on software diversity and ensures that any two diversified instances of the same malware have low similarity (according to a set of similarity metrics). An LLVM-based prototype implementation diversifies both code and data of binaries and our evaluation shows that signatures based on similarity only match one or few instances in a pool of diversified binaries generated from the same source code.
CRJul 2, 2014
Lockdown: Dynamic Control-Flow IntegrityMathias Payer, Antonio Barresi, Thomas R. Gross
Applications written in low-level languages without type or memory safety are especially prone to memory corruption. Attackers gain code execution capabilities through such applications despite all currently deployed defenses by exploiting memory corruption vulnerabilities. Control-Flow Integrity (CFI) is a promising defense mechanism that restricts open control-flow transfers to a static set of well-known locations. We present Lockdown, an approach to dynamic CFI that protects legacy, binary-only executables and libraries. Lockdown adaptively learns the control-flow graph of a running process using information from a trusted dynamic loader. The sandbox component of Lockdown restricts interactions between different shared objects to imported and exported functions by enforcing fine-grained CFI checks. Our prototype implementation shows that dynamic CFI results in low performance overhead.