55.4ARApr 30Code
CuLifter: Lifting GPU Binaries to Typed IRJisheng Zhao, Huanzhi Pu, Shinnung Jeong et al.
GPU compilers merge all data types into a single unified register file, erasing the type information that binary-analysis tools rely on. We show that type recovery from this untyped register file is the central challenge of GPU binary lifting. We present CuLifter, a SASS-to-LLVM IR lifting framework that recovers register types via constraint propagation with conflict detection, reconstructs explicit control flow, and aggregates multi-instruction patterns. Across eight benchmark suites (24,437 GPU functions in 919 cubins) spanning open-source applications, vendor libraries, and optimized ML runtimes, CuLifter successfully lifts 99.98% of functions to valid LLVM IR. An ablation study confirms that type recovery is the only step required to produce semantically correct IR: disabling it drops the x86 pass rate from 73.8% to 0%, a 73.8 percentage-point drop.
CRApr 22, 2015Code
Finding Tizen security bugs through whole-system static analysisDaniel Song, Jisheng Zhao, Michael Burke et al.
Tizen is a new Linux-based open source platform for consumer devices including smartphones, televisions, vehicles, and wearables. While Tizen provides kernel-level mandatory policy enforcement, it has a large collection of libraries, implemented in a mix of C and C++, which make their own security checks. In this research, we describe the design and engineering of a static analysis engine which drives a full information flow analysis for apps and a control flow analysis for the full library stack. We implemented these static analyses as extensions to LLVM, requiring us to improve LLVM's native analysis features to get greater precision and scalability, including knotty issues like the coexistence of C++ inheritance with C function pointer use. With our tools, we found several unexpected behaviors in the Tizen system, including paths through the system libraries that did not have inline security checks. We show how our tools can help the Tizen app store to verify important app properties as well as helping the Tizen development process avoid the accidental introduction of subtle vulnerabilities.
PLSep 13, 2020
Advanced Graph-Based Deep Learning for Probabilistic Type InferenceFangke Ye, Jisheng Zhao, Vivek Sarkar
Dynamically typed languages such as JavaScript and Python have emerged as the most popular programming languages in use. Important benefits can accrue from including type annotations in dynamically typed programs. This approach to gradual typing is exemplified by the TypeScript programming system which allows programmers to specify partially typed programs, and then uses static analysis to infer the remaining types. However, in general, the effectiveness of static type inference is limited and depends on the complexity of the program's structure and the initial type annotations. As a result, there is a strong motivation for new approaches that can advance the state of the art in statically predicting types in dynamically typed programs, and that do so with acceptable performance for use in interactive programming environments. Previous work has demonstrated the promise of probabilistic type inference using deep learning. In this paper, we advance past work by introducing a range of graph neural network (GNN) models that operate on a novel type flow graph (TFG) representation. The TFG represents an input program's elements as graph nodes connected with syntax edges and data flow edges, and our GNN models are trained to predict the type labels in the TFG for a given input program. We study different design choices for our GNN models for the 100 most common types in our evaluation dataset, and show that our best two GNN configurations for accuracy achieve a top-1 accuracy of 87.76% and 86.89% respectively, outperforming the two most closely related deep learning type inference approaches from past work -- DeepTyper with a top-1 accuracy of 84.62% and LambdaNet with a top-1 accuracy of 79.45%. Further, the average inference throughputs of those two configurations are 353.8 and 1,303.9 files/second, compared to 186.7 files/second for DeepTyper and 1,050.3 files/second for LambdaNet.