Hakjoo Oh

h-index23

5papers

277citations

Novelty60%

AI Score30

Ranked #139,054 of 194,257 authors (top 72%)#1,569 in SE (top 52%)

5 Papers

12.0SEFeb 11, 2021

CENTRIS: A Precise and Scalable Approach for Identifying Modified Open-Source Software Reuse

Seunghoon Woo, Sunghan Park, Seulbae Kim et al.

Open-source software (OSS) is widely reused as it provides convenience and efficiency in software development. Despite evident benefits, unmanaged OSS components can introduce threats, such as vulnerability propagation and license violation. Unfortunately, however, identifying reused OSS components is a challenge as the reused OSS is predominantly modified and nested. In this paper, we propose CENTRIS, a precise and scalable approach for identifying modified OSS reuse. By segmenting an OSS code base and detecting the reuse of a unique part of the OSS only, CENTRIS is capable of precisely identifying modified OSS reuse in the presence of nested OSS components. For scalability, CENTRIS eliminates redundant code comparisons and accelerates the search using hash functions. When we applied CENTRIS on 10,241 widely-employed GitHub projects, comprising 229,326 versions and 80 billion lines of code, we observed that modified OSS reuse is a norm in software development, occurring 20 times more frequently than exact reuse. Nonetheless, CENTRIS identified reused OSS components with 91% precision and 94% recall in less than a minute per application on average, whereas a recent clone detection technique, which does not take into account modified and nested OSS reuse, hardly reached 10% precision and 40% recall.

13.8PLAug 29, 2019

VeriSmart: A Highly Precise Safety Verifier for Ethereum Smart Contracts

Sunbeom So, Myungho Lee, Jisu Park et al.

We present VeriSmart, a highly precise verifier for ensuring arithmetic safety of Ethereum smart contracts. Writing safe smart contracts without unintended behavior is critically important because smart contracts are immutable and even a single flaw can cause huge financial damage. In particular, ensuring that arithmetic operations are safe is one of the most important and common security concerns of Ethereum smart contracts nowadays. In response, several safety analyzers have been proposed over the past few years, but state-of-the-art is still unsatisfactory; no existing tools achieve high precision and recall at the same time, inherently limited to producing annoying false alarms or missing critical bugs. By contrast, VeriSmart aims for an uncompromising analyzer that performs exhaustive verification without compromising precision or scalability, thereby greatly reducing the burden of manually checking undiscovered or incorrectly-reported issues. To achieve this goal, we present a new domain-specific algorithm for verifying smart contracts, which is able to automatically discover and leverage transaction invariants that are essential for precisely analyzing smart contracts. Evaluation with real-world smart contracts shows that VeriSmart can detect all arithmetic bugs with a negligible number of false alarms, far outperforming existing analyzers.

6.9SEJul 23, 2019

Enhancing Dynamic Symbolic Execution by Automatically Learning Search Heuristics

Sooyoung Cha, Seongjoon Hong, Jingyoung Kim et al.

We present a technique to automatically generate search heuristics for dynamic symbolic execution. A key challenge in dynamic symbolic execution is how to effectively explore the program's execution paths to achieve high code coverage in a limited time budget. Dynamic symbolic execution employs a search heuristic to address this challenge, which favors exploring particular types of paths that are most likely to maximize the final coverage. However, manually designing a good search heuristic is nontrivial and typically ends up with suboptimal and unstable outcomes. The goal of this paper is to overcome this shortcoming of dynamic symbolic execution by automatically learning search heuristics. We define a class of search heuristics, namely a parametric search heuristic, and present an algorithm that efficiently finds an optimal heuristic for each subject program. Experimental results with industrial-strength symbolic execution tools (e.g., KLEE) show that our technique can successfully generate search heuristics that significantly outperform existing manually-crafted heuristics in terms of branch coverage and bug-finding.

18.3SEMar 7, 2017Code

End-to-End Prediction of Buffer Overruns from Raw Source Code via Neural Memory Networks

Min-je Choi, Sehun Jeong, Hakjoo Oh et al.

Detecting buffer overruns from a source code is one of the most common and yet challenging tasks in program analysis. Current approaches have mainly relied on rigid rules and handcrafted features devised by a few experts, limiting themselves in terms of flexible applicability and robustness due to diverse bug patterns and characteristics existing in sophisticated real-world software programs. In this paper, we propose a novel, data-driven approach that is completely end-to-end without requiring any hand-crafted features, thus free from any program language-specific structural limitations. In particular, our approach leverages a recently proposed neural network model called memory networks that have shown the state-of-the-art performances mainly in question-answering tasks. Our experimental results using source codes demonstrate that our proposed model is capable of accurately detecting simple buffer overruns. We also present in-depth analyses on how a memory network can learn to understand the semantics in programming languages solely from raw source codes, such as tracing variables of interest, identifying numerical values, and performing their quantitative comparisons.

3.3PLFeb 21, 2017

Synthesizing Imperative Programs from Examples Guided by Static Analysis

Sunbeom So, Hakjoo Oh

We present a novel algorithm that synthesizes imperative programs for introductory programming courses. Given a set of input-output examples and a partial program, our algorithm generates a complete program that is consistent with every example. Our key idea is to combine enumerative program synthesis and static analysis, which aggressively prunes out a large search space while guaranteeing to find, if any, a correct solution. We have implemented our algorithm in a tool, called SIMPL, and evaluated it on 30 problems used in introductory programming courses. The results show that SIMPL is able to solve the benchmark problems in 6.6 seconds on average.