Taesoo Kim

CR
h-index12
31papers
803citations
Novelty52%
AI Score60

31 Papers

66.9CRMay 24Code
SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Architectures, and Lessons Learned

Cen Zhang, Younggi Park, Fabian Fleischer et al.

DARPA's AI Cyber Challenge (AIxCC, 2023--2025) is the largest competition to date for building fully autonomous cyber reasoning systems (CRSs) that leverage recent advances in AI -- particularly large language models (LLMs) -- to discover and remediate vulnerabilities in real-world open-source software. This paper presents the first systematic analysis of AIxCC. Drawing on design documents, source code, execution traces, and discussions with organizers and competing teams, we examine the competition's structure and key design decisions, characterize the architectural approaches of finalist CRSs, and analyze competition results beyond the final scoreboard. Our analysis reveals the factors that truly drove CRS performance, identifies genuine technical advances achieved by teams, and exposes limitations that remain open for future research. We conclude with lessons for organizing future competitions and broader insights toward deploying autonomous CRSs in practice.

LGJul 18, 2023Code
Selective Generation for Controllable Language Models

Minjae Lee, Kyungmin Kim, Taesoo Kim et al. · gatech

Trustworthiness of generative language models (GLMs) is crucial in their deployment to critical decision making systems. Hence, certified risk control methods such as selective prediction and conformal prediction have been applied to mitigating the hallucination problem in various supervised downstream tasks. However, the lack of appropriate correctness metric hinders applying such principled methods to language generation tasks. In this paper, we circumvent this problem by leveraging the concept of textual entailment to evaluate the correctness of the generated sequence, and propose two selective generation algorithms which control the false discovery rate with respect to the textual entailment relation (FDR-E) with a theoretical guarantee: $\texttt{SGen}^{\texttt{Sup}}$ and $\texttt{SGen}^{\texttt{Semi}}$. $\texttt{SGen}^{\texttt{Sup}}$, a direct modification of the selective prediction, is a supervised learning algorithm which exploits entailment-labeled data, annotated by humans. Since human annotation is costly, we further propose a semi-supervised version, $\texttt{SGen}^{\texttt{Semi}}$, which fully utilizes the unlabeled data by pseudo-labeling, leveraging an entailment set function learned via conformal prediction. Furthermore, $\texttt{SGen}^{\texttt{Semi}}$ enables to use more general class of selection functions, neuro-selection functions, and provides users with an optimal selection function class given multiple candidates. Finally, we demonstrate the efficacy of the $\texttt{SGen}$ family in achieving a desired FDR-E level with comparable selection efficiency to those from baselines on both open and closed source GLMs. Code and datasets are provided at https://github.com/ml-postech/selective-generation.

CRNov 17, 2022
ACon$^2$: Adaptive Conformal Consensus for Provable Blockchain Oracles

Sangdon Park, Osbert Bastani, Taesoo Kim · gatech

Blockchains with smart contracts are distributed ledger systems that achieve block-state consistency among distributed nodes by only allowing deterministic operations of smart contracts. However, the power of smart contracts is enabled by interacting with stochastic off-chain data, which in turn opens the possibility to undermine the block-state consistency. To address this issue, an oracle smart contract is used to provide a single consistent source of external data; but, simultaneously, this introduces a single point of failure, which is called the oracle problem. To address the oracle problem, we propose an adaptive conformal consensus (ACon$^2$) algorithm that derives a consensus set of data from multiple oracle contracts via the recent advance in online uncertainty quantification learning. Interesting, the consensus set provides a desired correctness guarantee under distribution shift and Byzantine adversaries. We demonstrate the efficacy of the proposed algorithm on two price datasets and an Ethereum case study. In particular, the Solidity implementation of the proposed algorithm shows the potential practicality of the proposed algorithm, implying that online machine learning algorithms are applicable to address security issues in blockchains.

CROct 31, 2022
Unsafe's Betrayal: Abusing Unsafe Rust in Binary Reverse Engineering via Machine Learning

Sangdon Park, Xiang Cheng, Taesoo Kim · gatech

Memory-safety bugs introduce critical software-security issues. Rust provides memory-safe mechanisms to avoid memory-safety bugs in programming, while still allowing unsafe escape hatches via unsafe code. However, the unsafe code that enhances the usability of Rust provides clear spots for finding memory-safety bugs in Rust source code. In this paper, we claim that these unsafe spots can still be identifiable in Rust binary code via machine learning and be leveraged for finding memory-safety bugs. To support our claim, we propose the tool textttrustspot, that enables reverse engineering to learn an unsafe classifier that proposes a list of functions in Rust binaries for downstream analysis. We empirically show that the function proposals by textttrustspot can recall $92.92\%$ of memory-safety bugs, while it covers only $16.79\%$ of the entire binary code. As an application, we demonstrate that the function proposals are used in targeted fuzzing on Rust packages, which contribute to reducing the fuzzing time compared to non-targeted fuzzing.

CVMar 28, 2023
Enhancing Breast Cancer Risk Prediction by Incorporating Prior Images

Hyeonsoo Lee, Junha Kim, Eunkyung Park et al.

Recently, deep learning models have shown the potential to predict breast cancer risk and enable targeted screening strategies, but current models do not consider the change in the breast over time. In this paper, we present a new method, PRIME+, for breast cancer risk prediction that leverages prior mammograms using a transformer decoder, outperforming a state-of-the-art risk prediction method that only uses mammograms from a single time point. We validate our approach on a dataset with 16,113 exams and further demonstrate that it effectively captures patterns of changes from prior mammograms, such as changes in breast density, resulting in improved short-term and long-term breast cancer risk prediction. Experimental results show that our model achieves a statistically significant improvement in performance over the state-of-the-art based model, with a C-index increase from 0.68 to 0.73 (p < 0.05) on held-out test sets.

CVOct 13, 2022
OOOE: Only-One-Object-Exists Assumption to Find Very Small Objects in Chest Radiographs

Gunhee Nam, Taesoo Kim, Sanghyup Lee et al.

The accurate localization of inserted medical tubes and parts of human anatomy is a common problem when analyzing chest radiographs and something deep neural networks could potentially automate. However, many foreign objects like tubes and various anatomical structures are small in comparison to the entire chest X-ray, which leads to severely unbalanced data and makes training deep neural networks difficult. In this paper, we present a simple yet effective `Only-One-Object-Exists' (OOOE) assumption to improve the deep network's ability to localize small landmarks in chest radiographs. The OOOE enables us to recast the localization problem as a classification problem and we can replace commonly used continuous regression techniques with a multi-class discrete objective. We validate our approach using a large scale proprietary dataset of over 100K radiographs as well as publicly available RANZCR-CLiP Kaggle Challenge dataset and show that our method consistently outperforms commonly used regression-based detection models as well as commonly used pixel-wise classification methods. Additionally, we find that the method using the OOOE assumption generalizes to multiple detection problems in chest X-rays and the resulting model shows state-of-the-art performance on detecting various tube tips inserted to the patient as well as patient anatomy.

85.8CRMar 25Code
OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security

Andrew Chin, Dongkwan Kim, Yu-Fu Fu et al.

DARPA's AI Cyber Challenge (AIxCC) showed that cyber reasoning systems (CRSs) can go beyond vulnerability discovery to autonomously confirm and patch bugs: seven teams built such systems and open-sourced them after the competition. Yet all seven open-sourced CRSs remain largely unusable outside their original teams, each bound to the competition cloud infrastructure that no longer exists. We present OSS-CRS, an open, locally deployable framework for running and combining CRS techniques against real-world open-source projects, with budget-aware resource management. We ported the first-place system (Atlantis) and discovered 10 previously unknown bugs (three of high severity) across 8 OSS-Fuzz projects. OSS-CRS is publicly available.

51.0CRApr 2Code
Contextualizing Sink Knowledge for Java Vulnerability Discovery

Fabian Fleischer, Cen Zhang, Joonun Jang et al.

Java applications are prone to vulnerabilities stemming from the insecure use of security-sensitive APIs, such as file operations enabling path traversal or deserialization routines allowing remote code execution. These sink APIs encode critical information for vulnerability discovery: the program-specific constraints required to reach them and the exploitation conditions necessary to trigger security flaws. Despite this, existing fuzzers largely overlook such vulnerability-specific knowledge, limiting their effectiveness. We present GONDAR, a sink-centric fuzzing framework that systematically leverages sink API semantics for targeted vulnerability discovery. GONDAR first identifies reachable and exploitable sink call sites through CWE-specific scanning combined with LLM-assisted static filtering. It then deploys two specialized agents that work collaboratively with a coverage-guided fuzzer: an exploration agent generates inputs to reach target call sites by iteratively solving path constraints, while an exploitation agent synthesizes proof-of-concept exploits by reasoning about and satisfying vulnerability-triggering conditions. The agents and fuzzer continuously exchange seeds and runtime feedback, complementing each other. We evaluated GONDAR on real-world Java benchmarks, where it discovers four times more vulnerabilities than Jazzer, the state-of-the-art Java fuzzer. Notably, GONDAR also demonstrated strong performance in the DARPA AI Cyber Challenge, and is integrated into OSS-CRS, a sandbox project in The Linux Foundation's OpenSSF, to improve the security of open-source software.

LGJun 20, 2023
Understanding Contrastive Learning Through the Lens of Margins

Daniel Rho, TaeSoo Kim, Sooill Park et al.

Contrastive learning, along with its variations, has been a highly effective self-supervised learning method across diverse domains. Contrastive learning measures the distance between representations using cosine similarity and uses cross-entropy for representation learning. Within the same framework of cosine-similarity-based representation learning, margins have played a significant role in enhancing face and speaker recognition tasks. Interestingly, despite the shared reliance on the same similarity metrics and objective functions, contrastive learning has not actively adopted margins. Furthermore, decision-boundary-based explanations are the only ones that have been used to explain the effect of margins in contrastive learning. In this work, we propose a new perspective to understand the role of margins based on gradient analysis. Based on the new perspective, we analyze how margins affect gradients of contrastive learning and separate the effect into more elemental levels. We separately analyze each and provide possible directions for improving contrastive learning. Our experimental results demonstrate that emphasizing positive samples and scaling gradients depending on positive sample angles and logits are the keys to improving the generalization performance of contrastive learning in both seen and unseen datasets, and other factors can only marginally improve performance.

65.1CRApr 23Code
Position Paper: Denial-of-Service Against Multi-Round Transaction Simulation

Yuzhe Tang, Yibo Wang, Wanning Ding et al.

In Ethereum, transaction-bundling services are a critical component of block builders, such as Flashbots Bundles, and are widely used by MEV searchers. Disrupting bundling services can degrade searcher experience and reduce builder revenue. Despite the extensive studies, the existing denial-of-service attack designs are ineffective against bundling services due to their unique multi-round execution model. This paper studies the open problem of asymmetric denial-of-service against bundling services. We develop evasive, risk-free, and low-cost DoS attacks on Flashbots' bundling service, the only open-source bundling service known to us. Our attacks exploit inter-transaction dependencies through contract state to achieve evasiveness, and abuse bundling-specific features, such as atomic block inclusion, to significantly reduce both capital and operational costs of the attack. Experimental results show that our attacks achieve high success rates, substantially reduce builders' revenue, and slow block production. We further propose mitigation strategies for the identified risks.

HCSep 18, 2025Code
Towards Human-like Multimodal Conversational Agent by Generating Engaging Speech

Taesoo Kim, Yongsik Jo, Hyunmin Song et al.

Human conversation involves language, speech, and visual cues, with each medium providing complementary information. For instance, speech conveys a vibe or tone not fully captured by text alone. While multimodal LLMs focus on generating text responses from diverse inputs, less attention has been paid to generating natural and engaging speech. We propose a human-like agent that generates speech responses based on conversation mood and responsive style information. To achieve this, we build a novel MultiSensory Conversation dataset focused on speech to enable agents to generate natural speech. We then propose a multimodal LLM-based model for generating text responses and voice descriptions, which are used to generate speech covering paralinguistic information. Experimental results demonstrate the effectiveness of utilizing both visual and audio modalities in conversation to generate engaging speech. The source code is available in https://github.com/kimtaesu24/MSenC

CRDec 14, 2021Code
In-Kernel Control-Flow Integrity on Commodity OSes using ARM Pointer Authentication

Sungbae Yoo, Jinbum Park, Seolheui Kim et al.

This paper presents an in-kernel, hardware-based control-flow integrity (CFI) protection, called PAL, that utilizes ARM's Pointer Authentication (PA). It provides three important benefits over commercial, state-of-the-art PA-based CFIs like iOS's: 1) enhancing CFI precision via automated refinement techniques, 2) addressing hindsight problems of PA for in kernel uses such as preemptive hijacking and brute-forcing attacks, and 3) assuring the algorithmic or implementation correctness via post validation. PAL achieves these goals in an OS-agnostic manner, so could be applied to commodity OSes like Linux and FreeBSD. The precision of the CFI protection can be adjusted for better performance or improved for better security with minimal engineering efforts if a user opts in to. Our evaluation shows that PAL incurs negligible performance overhead: e.g., <1% overhead for Apache benchmark and 3~5% overhead for Linux perf benchmark on the latest Mac mini (M1). Our post-validation approach helps us ensure the security invariant required for the safe uses of PA inside the kernel, which also reveals new attack vectors on the iOS kernel. PAL as well as the CFI-protected kernels will be open sourced.

CRNov 18, 2018Code
libmpk: Software Abstraction for Intel Memory Protection Keys

Soyeon Park, Sangho Lee, Wen Xu et al.

Intel memory protection keys (MPK) is a new hardware feature to support thread-local permission control on groups of pages without requiring modification of page tables. Unfortunately, its current hardware implementation and software supports suffer from security, scalability, and semantic-gap problems: (1) MPK is vulnerable to protection-key-use-after-free and protection-key corruption; (2) MPK does not scale due to hardware limitations; and (3) MPK is not perfectly compatible with mprotect() because it does not support permission synchronization across threads. In this paper, we propose libmpk, a software abstraction for MPK. libmpk virtualizes protection keys to eliminate the protection-key-use-after-free and protection-key corruption problems while supporting a tremendous number of memory page groups. libmpk also prevents unauthorized writes to its metadata and supports inter-thread key synchronization. We apply libmpk to three real-world applications: OpenSSL, JavaScript JIT compiler, and Memcached for memory protection and isolation. An evaluation shows that libmpk introduces negligible performance overhead (<1%) compared with insecure versions, and improves their performance by 8.1x over secure equivalents using mprotect(). The source code of libmpk will be publicly available and maintained as an open source project.

CVDec 31, 2023
RainSD: Rain Style Diversification Module for Image Synthesis Enhancement using Feature-Level Style Distribution

Hyeonjae Jeon, Junghyun Seo, Taesoo Kim et al.

Autonomous driving technology nowadays targets to level 4 or beyond, but the researchers are faced with some limitations for developing reliable driving algorithms in diverse challenges. To promote the autonomous vehicles to spread widely, it is important to address safety issues on this technology. Among various safety concerns, the sensor blockage problem by severe weather conditions can be one of the most frequent threats for multi-task learning based perception algorithms during autonomous driving. To handle this problem, the importance of the generation of proper datasets is becoming more significant. In this paper, a synthetic road dataset with sensor blockage generated from real road dataset BDD100K is suggested in the format of BDD100K annotation. Rain streaks for each frame were made by an experimentally established equation and translated utilizing the image-to-image translation network based on style transfer. Using this dataset, the degradation of the diverse multi-task networks for autonomous driving, such as lane detection, driving area segmentation, and traffic object detection, has been thoroughly evaluated and analyzed. The tendency of the performance degradation of deep neural network-based perception systems for autonomous vehicle has been analyzed in depth. Finally, we discuss the limitation and the future directions of the deep neural network-based perception algorithms and autonomous driving dataset generation based on image-to-image translation.

41.7CVApr 9
Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video

Chanhyuk Choi, Taesoo Kim, Donggyu Lee et al.

Talking face generation has gained significant attention as a core application of generative models. To enhance the expressiveness and realism of synthesized videos, emotion editing in talking face video plays a crucial role. However, existing approaches often limit expressive flexibility and struggle to generate extended emotions. Label-based methods represent emotions with discrete categories, which fail to capture a wide range of emotions. Audio-based methods can leverage emotionally rich speech signals - and even benefit from expressive text-to-speech (TTS) synthesis - but they fail to express the target emotions because emotions and linguistic contents are entangled in emotional speeches. Images-based methods, on the other hand, rely on target reference images to guide emotion transfer, yet they require high-quality frontal views and face challenges in acquiring reference data for extended emotions (e.g., sarcasm). To address these limitations, we propose Cross-Modal Emotion Transfer (C-MET), a novel approach that generates facial expressions based on speeches by modeling emotion semantic vectors between speech and visual feature spaces. C-MET leverages a large-scale pretrained audio encoder and a disentangled facial expression encoder to learn emotion semantic vectors that represent the difference between two different emotional embeddings across modalities. Extensive experiments on the MEAD and CREMA-D datasets demonstrate that our method improves emotion accuracy by 14% over state-of-the-art methods, while generating expressive talking face videos - even for unseen extended emotions. Code, checkpoint, and demo are available at https://chanhyeok-choi.github.io/C-MET/

CRSep 18, 2025
ATLANTIS: AI-driven Threat Localization, Analysis, and Triage Intelligence System

Taesoo Kim, HyungSeok Han, Soyeon Park et al.

We present ATLANTIS, the cyber reasoning system developed by Team Atlanta that won 1st place in the Final Competition of DARPA's AI Cyber Challenge (AIxCC) at DEF CON 33 (August 2025). AIxCC (2023-2025) challenged teams to build autonomous cyber reasoning systems capable of discovering and patching vulnerabilities at the speed and scale of modern software. ATLANTIS integrates large language models (LLMs) with program analysis -- combining symbolic execution, directed fuzzing, and static analysis -- to address limitations in automated vulnerability discovery and program repair. Developed by researchers at Georgia Institute of Technology, Samsung Research, KAIST, and POSTECH, the system addresses core challenges: scaling across diverse codebases from C to Java, achieving high precision while maintaining broad coverage, and producing semantically correct patches that preserve intended behavior. We detail the design philosophy, architectural decisions, and implementation strategies behind ATLANTIS, share lessons learned from pushing the boundaries of automated security when program analysis meets modern AI, and release artifacts to support reproducibility and future research.

SDJul 27, 2025
Do Not Mimic My Voice: Speaker Identity Unlearning for Zero-Shot Text-to-Speech

Taesoo Kim, Jinju Kim, Dongchan Kim et al.

The rapid advancement of Zero-Shot Text-to-Speech (ZS-TTS) technology has enabled high-fidelity voice synthesis from minimal audio cues, raising significant privacy and ethical concerns. Despite the threats to voice privacy, research to selectively remove the knowledge to replicate unwanted individual voices from pre-trained model parameters has not been explored. In this paper, we address the new challenge of speaker identity unlearning for ZS-TTS systems. To meet this goal, we propose the first machine unlearning frameworks for ZS-TTS, especially Teacher-Guided Unlearning (TGU), designed to ensure the model forgets designated speaker identities while retaining its ability to generate accurate speech for other speakers. Our proposed methods incorporate randomness to prevent consistent replication of forget speakers' voices, assuring unlearned identities remain untraceable. Additionally, we propose a new evaluation metric, speaker-Zero Retrain Forgetting (spk-ZRF). This assesses the model's ability to disregard prompts associated with forgotten speakers, effectively neutralizing its knowledge of these voices. The experiments conducted on the state-of-the-art model demonstrate that TGU prevents the model from replicating forget speakers' voices while maintaining high quality for other speakers. The demo is available at https://speechunlearn.github.io/

LGJan 25, 2025
ReInc: Scaling Training of Dynamic Graph Neural Networks

Mingyu Guan, Saumia Singhal, Taesoo Kim et al.

Dynamic Graph Neural Networks (DGNNs) have gained widespread attention due to their applicability in diverse domains such as traffic network prediction, epidemiological forecasting, and social network analysis. In this paper, we present ReInc, a system designed to enable efficient and scalable training of DGNNs on large-scale graphs. ReInc introduces key innovations that capitalize on the unique combination of Graph Neural Networks (GNNs) and Recurrent Neural Networks (RNNs) inherent in DGNNs. By reusing intermediate results and incrementally computing aggregations across consecutive graph snapshots, ReInc significantly enhances computational efficiency. To support these optimizations, ReInc incorporates a novel two-level caching mechanism with a specialized caching policy aligned to the DGNN execution workflow. Additionally, ReInc addresses the challenges of managing structural and temporal dependencies in dynamic graphs through a new distributed training strategy. This approach eliminates communication overheads associated with accessing remote features and redistributing intermediate results. Experimental results demonstrate that ReInc achieves up to an order of magnitude speedup compared to state-of-the-art frameworks, tested across various dynamic GNN architectures and real-world graph datasets.

SESep 29, 2025
Agentic Specification Generator for Move Programs

Yu-Fu Fu, Meng Xu, Taesoo Kim

While LLM-based specification generation is gaining traction, existing tools primarily focus on mainstream programming languages like C, Java, and even Solidity, leaving emerging and yet verification-oriented languages like Move underexplored. In this paper, we introduce MSG, an automated specification generation tool designed for Move smart contracts. MSG aims to highlight key insights that uniquely present when applying LLM-based specification generation to a new ecosystem. Specifically, MSG demonstrates that LLMs exhibit robust code comprehension and generation capabilities even for non-mainstream languages. MSG successfully generates verifiable specifications for 84% of tested Move functions and even identifies clauses previously overlooked by experts. Additionally, MSG shows that explicitly leveraging specification language features through an agentic, modular design improves specification quality substantially (generating 57% more verifiable clauses than conventional designs). Incorporating feedback from the verification toolchain further enhances the effectiveness of MSG, leading to a 30% increase in generated verifiable specifications.

CLJun 1, 2025
TESU-LLM: Training Speech-LLMs Without Speech via Unified Encoder Alignment

Taesoo Kim, Jong Hwan Ko

Recent advances in speech-enabled language models have shown promising results in building intelligent voice assistants. However, most existing approaches rely on large-scale paired speech-text data and extensive computational resources, which pose challenges in terms of scalability and accessibility. In this paper, we present \textbf{TESU-LLM}, a novel framework that enables training speech-capable language models using only text data. Our key insight is to leverage a unified encoder that maps semantically equivalent text and speech inputs to a shared latent space. By aligning the encoder output with the embedding space of a LLM via a lightweight projection network, we enable the model to generalize from text-only supervision to speech-based inference. Despite being trained exclusively on text, TESU-LLM achieves strong performance on various speech-related benchmarks, comparable to baseline methods trained with large-scale multimodal datasets and substantial computational resources. These results highlight the effectiveness and efficiency of our approach, offering a scalable path toward building speech LLMs without speech data.

SEMay 19, 2025
Ensuring Functional Correctness of Large Code Models with Selective Generation

Jaewoo Jeong, Taesoo Kim, Sangdon Park

The hallucination of code generation models hinders their applicability to systems requiring higher safety standards. One critical bottleneck in addressing code hallucination is the difficulty of identifying the functional correctness of generated code, due to its unnatural form. We address this core bottleneck by automatically generating unit tests using dynamic code analysis tools, leveraging the \emph{executable nature} of code. Accordingly, we propose \emph{selective code generator} that abstains from uncertain generations -- based on the functional correctness evaluated by generated unit tests -- to theoretically control the correctness among non-abstained answers, \ie the false discovery rate. Finally, we propose to use generated unit tests in evaluation as well as in learning for precise code evaluation, calling this paradigm \emph{FuzzEval}. We demonstrate the efficacy of our method along with the controllability of code hallucination and reasonable selection efficiency.

LGFeb 21, 2024
Heterogeneous Graph Neural Network on Semantic Tree

Mingyu Guan, Jack W. Stokes, Qinlong Luo et al.

The recent past has seen an increasing interest in Heterogeneous Graph Neural Networks (HGNNs), since many real-world graphs are heterogeneous in nature, from citation graphs to email graphs. However, existing methods ignore a tree hierarchy among metapaths, naturally constituted by different node types and relation types. In this paper, we present HetTree, a novel HGNN that models both the graph structure and heterogeneous aspects in a scalable and effective manner. Specifically, HetTree builds a semantic tree data structure to capture the hierarchy among metapaths. To effectively encode the semantic tree, HetTree uses a novel subtree attention mechanism to emphasize metapaths that are more helpful in encoding parent-child relationships. Moreover, HetTree proposes carefully matching pre-computed features and labels correspondingly, constituting a complete metapath representation. Our evaluation of HetTree on a variety of real-world datasets demonstrates that it outperforms all existing baselines on open benchmarks and efficiently scales to large real-world graphs with millions of nodes and edges.

CRJan 21, 2022
Attack of the Clones: Measuring the Maintainability, Originality and Security of Bitcoin 'Forks' in the Wild

Jusop Choi, Wonseok Choi, William Aiken et al.

Since Bitcoin appeared in 2009, over 6,000 different cryptocurrency projects have followed. The cryptocurrency world may be the only technology where a massive number of competitors offer similar services yet claim unique benefits, including scalability, fast transactions, and security. But are these projects really offering unique features and significant enhancements over their competitors? To answer this question, we conducted a large-scale empirical analysis of code maintenance activities, originality and security across 592 crypto projects. We found that about half of these projects have not been updated for the last six months; over two years, about three-quarters of them disappeared, or were reported as scams or inactive. We also investigated whether 11 security vulnerabilities patched in Bitcoin were also patched in other projects. We found that about 80% of 510 C-language-based cryptocurrency projects have at least one unpatched vulnerability, and the mean time taken to fix the vulnerability is 237.8 days. Among those 510 altcoins, we found that at least 157 altcoins are likely to have been forked from Bitcoin, about a third of them containing only slight changes from the Bitcoin version from which they were forked. As case studies, we did a deep dive into 20 altcoins (e.g., Litecoin, FujiCoin, and Feathercoin) similar to the version of Bitcoin used for the fork. About half of them did not make any technically meaningful change - failing to comply with the promises (e.g., about using Proof of Stake) made in their whitepapers.

CRDec 31, 2021
SOK: On the Analysis of Web Browser Security

Jungwon Lim, Yonghwi Jin, Mansour Alharthi et al.

Web browsers are integral parts of everyone's daily life. They are commonly used for security-critical and privacy sensitive tasks, like banking transactions and checking medical records. Unfortunately, modern web browsers are too complex to be bug free (e.g., 25 million lines of code in Chrome), and their role as an interface to the cyberspace makes them an attractive target for attacks. Accordingly, web browsers naturally become an arena for demonstrating advanced exploitation techniques by attackers and state-of-the-art defenses by browser vendors. Web browsers, arguably, are the most exciting place to learn the latest security issues and techniques, but remain as a black art to most security researchers because of their fast-changing characteristics and complex code bases. To bridge this gap, this paper attempts to systematize the security landscape of modern web browsers by studying the popular classes of security bugs, their exploitation techniques, and deployed defenses. More specifically, we first introduce a unified architecture that faithfully represents the security design of four major web browsers. Second, we share insights from a 10-year longitudinal study on browser bugs. Third, we present a timeline and context of mitigation schemes and their effectiveness. Fourth, we share our lessons from a full-chain exploit used in 2020 Pwn2Own competition. and the implication of bug bounty programs to web browser security. We believe that the key takeaways from this systematization can shed light on how to advance the status quo of modern web browsers, and, importantly, how to create secure yet complex software in the future.

CRJun 10, 2021
Semantic-aware Binary Code Representation with BERT

Hyungjoon Koo, Soyeon Park, Daejin Choi et al.

A wide range of binary analysis applications, such as bug discovery, malware analysis and code clone detection, require recovery of contextual meanings on a binary code. Recently, binary analysis techniques based on machine learning have been proposed to automatically reconstruct the code representation of a binary instead of manually crafting specifics of the analysis algorithm. However, the existing approaches utilizing machine learning are still specialized to solve one domain of problems, rendering recreation of models for different types of binary analysis. In this paper, we propose DeepSemantic utilizing BERT in producing the semantic-aware code representation of a binary code. To this end, we introduce well-balanced instruction normalization that holds rich information for each of instructions yet minimizing an out-of-vocabulary (OOV) problem. DeepSemantic has been carefully designed based on our study with large swaths of binaries. Besides, DeepSemantic leverages the essence of the BERT architecture into re-purposing a pre-trained generic model that is readily available as a one-time processing, followed by quickly applying specific downstream tasks with a fine-tuning process. We demonstrate DeepSemantic with two downstream tasks, namely, binary similarity comparison and compiler provenance (i.e., compiler and optimization level) prediction. Our experimental results show that the binary similarity model outperforms two state-of-the-art binary similarity tools, DeepBinDiff and SAFE, 49.84% and 15.83% on average, respectively.

CRSep 24, 2019
P2FAAS: Toward Privacy-Preserving Fuzzing as a Service

Fan Sang, Daehee Jang, Ming-Wei Shih et al.

Global corporations (e.g., Google and Microsoft) have recently introduced a new model of cloud services, fuzzing-as-a-service (FaaS). Despite effectively alleviating the cost of fuzzing, the model comes with privacy concerns. For example, the end user has to trust both cloud and service providers who have access to the application to be fuzzed. Such concerns are due to the platform is under the control of its provider and the application and the fuzzer are highly coupled. In this paper, we propose P2FaaS, a new ecosystem that preserves end user's privacy while providing FaaS in the cloud. The key idea of P2FaaS is to utilize Intel SGX for preventing cloud and service providers from learning information about the application. Our preliminary evaluation shows that P2FaaS imposes 45% runtime overhead to the fuzzing compared to the baseline. In addition, P2FaaS demonstrates that, with recently introduced hardware, Intel SGX Card, the fuzzing service can be scaled up to multiple servers without native SGX support.

CRMar 1, 2019
Automatic Techniques to Systematically Discover New Heap Exploitation Primitives

Insu Yun, Dhaval Kapil, Taesoo Kim

Heap exploitation techniques to abuse the metadata of allocators have been widely studied since they are application independent and can be used in restricted environments that corrupt only metadata. Although prior work has found several interesting exploitation techniques, they are ad-hoc and manual, which cannot effectively handle changes or a variety of allocators. In this paper, we present a new naming scheme for heap exploitation techniques that systematically organizes them to discover the unexplored space in finding the techniques and ArcHeap, the tool that finds heap exploitation techniques automatically and systematically regardless of their underlying implementations. For that, ArcHeap generates a set of heap actions (e.g. allocation or deallocation) by leveraging fuzzing, which exploits common designs of modern heap allocators. Then, ArcHeap checks whether the actions result in impact of exploitations such as arbitrary write or overlapped chunks that efficiently determine if the actions can be converted into the exploitation technique. Finally, from these actions, ArcHeap generates Proof-of-Concept code automatically for an exploitation technique. We evaluated ArcHeap with real-world allocators --- ptmalloc, jemalloc, and tcmalloc --- and custom allocators from the DARPA Cyber Grand Challenge. ArcHeap successfully found 14 out of 16 known exploitation techniques and found five new exploitation techniques in ptmalloc. Moreover, ArcHeap found several exploitation techniques for jemalloc, tcmalloc, and even for the custom allocators. Further, ArcHeap can automatically show changes in exploitation techniques along with version change in ptmalloc using differential testing.

CRSep 24, 2018
SPX: Preserving End-to-End Security for Edge Computing

Ketan Bhardwaj, Ming-Wei Shih, Ada Gavrilovska et al.

Beyond point solutions, the vision of edge computing is to enable web services to deploy their edge functions in a multi-tenant infrastructure present at the edge of mobile networks. However, edge functions can be rendered useless because of one critical issue: Web services are delivered over end-to-end encrypted connections, so edge functions cannot operate on encrypted traffic without compromising security or degrading performance. Any solution to this problem must interoperate with existing protocols like TLS, as well as with new emerging security protocols for client and IoT devices. The edge functions must remain invisible to client-side endpoints but may require explicit control from their service-side web services. Finally, a solution must operate within overhead margins which do not obviate the benefits of the edge. To address this problem, this paper presents SPX - a solution for edge-ready and end-to-end secure protocol extensions, which can efficiently maintain end-to-edge-to-end ($E^3$) security semantics. Using our SPX prototype, we allow edge functions to operate on encrypted traffic, while ensuring that security semantics of secure protocols still hold. SPX uses Intel SGX to bind the communication channel with remote attestation and to provide a solution that not only defends against potential attacks but also results in low performance overheads, and neither mandates any changes on the end-user side nor breaks interoperability with existing protocols.

CROct 25, 2017
Leaking Uninitialized Secure Enclave Memory via Structure Padding (Extended Abstract)

Sangho Lee, Taesoo Kim

Intel software guard extensions (SGX) aims to provide an isolated execution environment, known as an enclave, for a user-level process to maximize its confidentiality and integrity. In this paper, we study how uninitialized data inside a secure enclave can be leaked via structure padding. We found that, during ECALL and OCALL, proxy functions that are automatically generated by the Intel SGX Software Development Kit (SDK) fully copy structure variables from an enclave to the normal memory to return the result of an ECALL function and to pass input parameters to an OCALL function. If the structure variables contain padding bytes, uninitialized enclave memory, which might contain confidential data like a private key, can be copied to the normal memory through the padding bytes. We also consider potential countermeasures against these security threats.

CRMay 25, 2017
Bunshin: Compositing Security Mechanisms through Diversification (with Appendix)

Meng Xu, Kangjie Lu, Taesoo Kim et al.

A number of security mechanisms have been proposed to harden programs written in unsafe languages, each of which mitigates a specific type of memory error. Intuitively, enforcing multiple security mechanisms on a target program will improve its overall security. However, this is not yet a viable approach in practice because the execution slowdown caused by various security mechanisms is often non-linearly accumulated, making the combined protection prohibitively expensive; further, most security mechanisms are designed for independent or isolated uses and thus are often in conflict with each other, making it impossible to fuse them in a straightforward way. In this paper, we present Bunshin, an N-version-based system that enables different and even conflicting security mechanisms to be combined to secure a program while at the same time reducing the execution slowdown. In particular, we propose an automated mechanism to distribute runtime security checks in multiple program variants in such a way that conflicts between security checks are inherently eliminated and execution slowdown is minimized with parallel execution. We also present an N-version execution engine to seamlessly synchronize these variants so that all distributed security checks work together to guarantee the security of a target program.

CRNov 21, 2016
Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing

Sangho Lee, Ming-Wei Shih, Prasun Gera et al.

In this paper, we explore a new, yet critical, side-channel attack against Intel Software Guard Extension (SGX), called a branch shadowing attack, which can reveal fine-grained control flows (i.e., each branch) of an enclave program running on real SGX hardware. The root cause of this attack is that Intel SGX does not clear the branch history when switching from enclave mode to non-enclave mode, leaving the fine-grained traces to the outside world through a branch-prediction side channel. However, exploiting the channel is not so straightforward in practice because 1) measuring branch prediction/misprediction penalties based on timing is too inaccurate to distinguish fine-grained control-flow changes and 2) it requires sophisticated control over the enclave execution to force its execution to the interesting code blocks. To overcome these challenges, we developed two novel exploitation techniques: 1) Intel PT- and LBR-based history-inferring techniques and 2) APIC-based technique to control the execution of enclave programs in a fine-grained manner. As a result, we could demonstrate our attack by breaking recent security constructs, including ORAM schemes, Sanctum, SGX-Shield, and T-SGX. Not limiting our work to the attack itself, we thoroughly studied the feasibility of hardware-based solutions (e.g., branch history clearing) and also proposed a software-based countermeasure, called Zigzagger, to mitigate the branch shadowing attack in practice.