SESep 11, 2023
Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence Reasoning (Extended Version)Pingchuan Ma, Zhenlan Ji, Peisen Yao et al.
Causal discovery is a powerful technique for identifying causal relationships among variables in data. It has been widely used in various applications in software engineering. Causal discovery extensively involves conditional independence (CI) tests. Hence, its output quality highly depends on the performance of CI tests, which can often be unreliable in practice. Moreover, privacy concerns arise when excessive CI tests are performed. Despite the distinct nature between unreliable and excessive CI tests, this paper identifies a unified and principled approach to addressing both of them. Generally, CI statements, the outputs of CI tests, adhere to Pearl's axioms, which are a set of well-established integrity constraints on conditional independence. Hence, we can either detect erroneous CI statements if they violate Pearl's axioms or prune excessive CI statements if they are logically entailed by Pearl's axioms. Holistically, both problems boil down to reasoning about the consistency of CI statements under Pearl's axioms (referred to as CIR problem). We propose a runtime verification tool called CICheck, designed to harden causal discovery algorithms from reliability and privacy perspectives. CICheck employs a sound and decidable encoding scheme that translates CIR into SMT problems. To solve the CIR problem efficiently, CICheck introduces a four-stage decision procedure with three lightweight optimizations that actively prove or refute consistency, and only resort to costly SMT-based reasoning when necessary. Based on the decision procedure to CIR, CICheck includes two variants: ED-CICheck and ED-CICheck, which detect erroneous CI tests (to enhance reliability) and prune excessive CI tests (to enhance privacy), respectively. [abridged due to length limit]
60.2SEApr 16
HintPilot: LLM-based Compiler Hint Synthesis for Code OptimizationHanyun Jiang, Peisen Yao, Kaiyue Li et al.
Code optimization remains a core objective in software development, yet modern compilers struggle to navigate the enormous optimization spaces. While recent research has looked into employing large language models (LLMs) to optimize source code directly, these techniques can introduce semantic errors and miss fine-grained compiler-level optimization opportunities. We present HintPilot, which bridges LLM-based reasoning with traditional compiler infrastructures via synthesizing compiler hints, annotations that steer compiler behavior. HintPilot employs retrieval-augmented synthesis over compiler documentation and applies profiling-guided iterative refinement to synthesize semantics-preserving and effective hints. Upon PolyBench and HumanEval-CPP benchmarks, HintPilot achieves up to 6.88x geometric mean speedup over -Ofast while preserving program correctness.
PLSep 16, 2021
Efficient Path-Sensitive Data-Dependence AnalysisPeisen Yao, Jinguo Zhou, Xiao Xiao et al.
This paper presents a scalable path- and context-sensitive data-dependence analysis. The key is to address the aliasing-path-explosion problem via a sparse, demand-driven, and fused approach that piggybacks the computation of pointer information with the resolution of data dependence. Specifically, our approach decomposes the computational efforts of disjunctive reasoning into 1) a context- and semi-path-sensitive analysis that concisely summarizes data dependence as the symbolic and storeless value-flow graphs, and 2) a demand-driven phase that resolves transitive data dependence over the graphs. We have applied the approach to two clients, namely thin slicing and value flow analysis. Using a suite of 16 programs ranging from 13 KLoC to 8 MLoC, we compare our techniques against a diverse group of state-of-the-art analyses, illustrating significant precision and scalability advantages of our approach.
SEJul 8, 2021
Duplicate-sensitivity Guided Transformation Synthesis for DBMS Correctness Bug DetectionYushan Zhang, Peisen Yao, Rongxin Wu et al.
Database Management System (DBMS) plays a core role in modern software from mobile apps to online banking. It is critical that DBMS should provide correct data to all applications. When the DBMS returns incorrect data, a correctness bug is triggered. Current production-level DBMSs still suffer from insufficient testing due to the limited hand-written test cases. Recently several works proposed to automatically generate many test cases with query transformation, a process of generating an equivalent query pair and testing a DBMS by checking whether the system returns the same result set for both queries. However, all of them still heavily rely on manual work to provide a transformation which largely confines their exploration of the valid input query space. This paper introduces duplicate-sensitivity guided transformation synthesis which automatically finds new transformations by first synthesizing many candidates then filtering the nonequivalent ones. Our automated synthesis is achieved by mutating a query while keeping its duplicate sensitivity, which is a necessary condition for query equivalence. After candidate synthesis, we keep the mutant query which is equivalent to the given one by using a query equivalent checker. Furthermore, we have implemented our idea in a tool Eqsql and used it to test the production-level DBMSs. In two months, we detected in total 30 newly confirmed and unique bugs in MySQL, TiDB and CynosDB.