SEFeb 11, 2021Code
CENTRIS: A Precise and Scalable Approach for Identifying Modified Open-Source Software ReuseSeunghoon Woo, Sunghan Park, Seulbae Kim et al.
Open-source software (OSS) is widely reused as it provides convenience and efficiency in software development. Despite evident benefits, unmanaged OSS components can introduce threats, such as vulnerability propagation and license violation. Unfortunately, however, identifying reused OSS components is a challenge as the reused OSS is predominantly modified and nested. In this paper, we propose CENTRIS, a precise and scalable approach for identifying modified OSS reuse. By segmenting an OSS code base and detecting the reuse of a unique part of the OSS only, CENTRIS is capable of precisely identifying modified OSS reuse in the presence of nested OSS components. For scalability, CENTRIS eliminates redundant code comparisons and accelerates the search using hash functions. When we applied CENTRIS on 10,241 widely-employed GitHub projects, comprising 229,326 versions and 80 billion lines of code, we observed that modified OSS reuse is a norm in software development, occurring 20 times more frequently than exact reuse. Nonetheless, CENTRIS identified reused OSS components with 91% precision and 94% recall in less than a minute per application on average, whereas a recent clone detection technique, which does not take into account modified and nested OSS reuse, hardly reached 10% precision and 40% recall.
PLAug 29, 2019
VeriSmart: A Highly Precise Safety Verifier for Ethereum Smart ContractsSunbeom So, Myungho Lee, Jisu Park et al.
We present VeriSmart, a highly precise verifier for ensuring arithmetic safety of Ethereum smart contracts. Writing safe smart contracts without unintended behavior is critically important because smart contracts are immutable and even a single flaw can cause huge financial damage. In particular, ensuring that arithmetic operations are safe is one of the most important and common security concerns of Ethereum smart contracts nowadays. In response, several safety analyzers have been proposed over the past few years, but state-of-the-art is still unsatisfactory; no existing tools achieve high precision and recall at the same time, inherently limited to producing annoying false alarms or missing critical bugs. By contrast, VeriSmart aims for an uncompromising analyzer that performs exhaustive verification without compromising precision or scalability, thereby greatly reducing the burden of manually checking undiscovered or incorrectly-reported issues. To achieve this goal, we present a new domain-specific algorithm for verifying smart contracts, which is able to automatically discover and leverage transaction invariants that are essential for precisely analyzing smart contracts. Evaluation with real-world smart contracts shows that VeriSmart can detect all arithmetic bugs with a negligible number of false alarms, far outperforming existing analyzers.
SEJul 23, 2019
Enhancing Dynamic Symbolic Execution by Automatically Learning Search HeuristicsSooyoung Cha, Seongjoon Hong, Jingyoung Kim et al.
We present a technique to automatically generate search heuristics for dynamic symbolic execution. A key challenge in dynamic symbolic execution is how to effectively explore the program's execution paths to achieve high code coverage in a limited time budget. Dynamic symbolic execution employs a search heuristic to address this challenge, which favors exploring particular types of paths that are most likely to maximize the final coverage. However, manually designing a good search heuristic is nontrivial and typically ends up with suboptimal and unstable outcomes. The goal of this paper is to overcome this shortcoming of dynamic symbolic execution by automatically learning search heuristics. We define a class of search heuristics, namely a parametric search heuristic, and present an algorithm that efficiently finds an optimal heuristic for each subject program. Experimental results with industrial-strength symbolic execution tools (e.g., KLEE) show that our technique can successfully generate search heuristics that significantly outperform existing manually-crafted heuristics in terms of branch coverage and bug-finding.
SEMar 7, 2017
End-to-End Prediction of Buffer Overruns from Raw Source Code via Neural Memory NetworksMin-je Choi, Sehun Jeong, Hakjoo Oh et al.
Detecting buffer overruns from a source code is one of the most common and yet challenging tasks in program analysis. Current approaches have mainly relied on rigid rules and handcrafted features devised by a few experts, limiting themselves in terms of flexible applicability and robustness due to diverse bug patterns and characteristics existing in sophisticated real-world software programs. In this paper, we propose a novel, data-driven approach that is completely end-to-end without requiring any hand-crafted features, thus free from any program language-specific structural limitations. In particular, our approach leverages a recently proposed neural network model called memory networks that have shown the state-of-the-art performances mainly in question-answering tasks. Our experimental results using source codes demonstrate that our proposed model is capable of accurately detecting simple buffer overruns. We also present in-depth analyses on how a memory network can learn to understand the semantics in programming languages solely from raw source codes, such as tracing variables of interest, identifying numerical values, and performing their quantitative comparisons.
PLFeb 21, 2017
Synthesizing Imperative Programs from Examples Guided by Static AnalysisSunbeom So, Hakjoo Oh
We present a novel algorithm that synthesizes imperative programs for introductory programming courses. Given a set of input-output examples and a partial program, our algorithm generates a complete program that is consistent with every example. Our key idea is to combine enumerative program synthesis and static analysis, which aggressively prunes out a large search space while guaranteeing to find, if any, a correct solution. We have implemented our algorithm in a tool, called SIMPL, and evaluated it on 30 problems used in introductory programming courses. The results show that SIMPL is able to solve the benchmark problems in 6.6 seconds on average.