PLSEMay 11

Quantitative Symbolic Patch Impact Analysis

arXiv:2605.1388530.4Has Code
Predicted impact top 46% in PL · last 90 daysOriginality Incremental advance
AI Analysis

For software engineers and security analysts, it provides a method to assess patch impact by measuring the extent of behavioral divergence, addressing the limitation of traditional equivalence checking.

The paper introduces quantitative partial equivalence analysis to quantify behavioral differences between original and patched programs, demonstrating its effectiveness on 90 CVE patches and identifying mislabeled equivalent programs in EqBench.

Traditional equivalence checking classifies programs as equivalent or non-equivalent, providing insufficient information for tasks like patch impact analysis where it is expected the patched version of the program to be non-equivalent to the original program. When two program versions are non-equivalent, determining under what conditions they differ and what percentage of inputs are affected remains an open challenge. In this work, we introduce quantitative partial equivalence analysis, an approach for assessing software patches by quantifying behavioral differences between the original (vulnerable) code and the patched code. Using symbolic analysis, we identify input conditions under which patched and original programs exhibit identical or divergent behaviors. Our approach refines non-equivalence by measuring the extent of behavioral divergence across the input domain. For efficient quantitative analysis of numerical domains, we propose a range-based search heuristic that provides a sound lower bound on equivalence. We demonstrate our approach on 90 CVE patches from widely used open-source projects (Linux, Qemu, FFmpeg), as well as on a Juliet Test Suite-based dataset containing programs with CWEs. Our results show that quantitative partial equivalence analysis effectively characterizes and quantifies patch impact. Additionally, experiments on the EqBench benchmark reveal five C program pairs that are mislabeled as equivalent, and we identify the input conditions under which their behaviors diverge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes