Huaien Zhang

90.6CRMay 19

Hunting Vulnerability Variants in AI Infra: Measurement and Reference-Driven Detection

Tian Dong, Yanjun Chen, Shoufeng Zhang et al.

AI infra has become a shared execution layer for model training, deployment, and agent orchestration. Because many projects reimplement similar model-centric workflows, a vulnerability disclosed in one repository can recur as a variant in another repository with a related design. Yet the prevalence and detectability of these variants remain poorly understood. This paper presents a measurement study of vulnerability variants in AI infra. Analyzing 688 GitHub repositories and 251 publicly disclosed vulnerabilities, we find that AI infra projects frequently share overlapping functionality and recurrent vulnerable patterns, creating a concrete basis for cross-repository variants. Building on this finding, we study how to automatically identify such variants from known disclosures. We propose INFRASCOPE, a reference-driven multi-agent framework that extracts transferable vulnerability semantics from known cases and uses them to locate and validate variants in new repositories. Evaluating INFRASCOPE on 20 real-world AI infra repositories, we uncover over 20 vulnerabilities, including 11 acknowledged cases and 4 cases that have been assigned CVEs so far.

19.7SEApr 9Code

Investigating Code Reuse in Software Redesign: A Case Study

Xiaowen Zhang, Huaien Zhang, Shin Hwei Tan

Software redesign preserves functionality while improving quality attributes, but manual reuse of code and tests is costly and error-prone, especially in crossrepository redesigns. Focusing on static analyzers where cross-repo redesign needs often arise, we conduct a bidirectional study of the ongoing Soot/SootUp redesign case using an action research methodology that combines empirical investigation with validated open-source contributions. Our study reveals: (1) non-linear migration which necessitates bidirectional reuse, (2) deferred reuse via TODOs, (3) neglected test porting, and (4) residual bug propagation during migrations. We identify tracking corresponding code and tests as the key challenge, and address it by retrofitting clone detection to derive code mappings between original and redesigned projects. Guided by semantic reuse patterns derived in our study, we propose Semantic Alignment Heuristics and a scalable hierarchical detection strategy. Evaluations on two redesigned project pairs (Soot/SootUp and FindBugs/SpotBugs) show that our approach achieves an average reduction of 33-99% in likely irrelevant clones at a SAS threshold of 0.5 across all tool results, and improves precision up to 86% on our benchmark of 1,749 samples. Moreover, we contribute to the redesigned projects by submitting five issues and 10 pull requests, of which eight have been merged.

Huaien Zhang

2 Papers