40.4SEMar 27Code
A Large-scale Empirical Study on the Generalizability of Disclosed Java Library Vulnerability ExploitsZirui Chen, Qi Zhan, Jiayuan Zhou et al.
Open-source software supply chain security relies heavily on assessing affected versions of library vulnerabilities. While prior studies have leveraged exploits for verifying vulnerability affected versions, they point out a key limitation that exploits are version-specific and cannot be directly applied across library versions. Despite being widely acknowledged, this limitation has not been systematically validated at scale, leaving the actual applicability of exploits across versions unexplored. To fill this gap, we conduct the first large-scale empirical study on exploit applicability across library versions. We construct a comprehensive dataset consisting of 259 exploits spanning 128 Java libraries and 28,150 historical versions, covering 61 CWEs that account for 76.33% of vulnerabilities in Maven. Leveraging this dataset, we execute each exploit against the library version history and compare the execution outcomes with our manually annotated ground-truth affected versions. We further investigate the root causes of inconsistencies between exploit execution and ground truth, and explore strategies for exploit migration. Our results (RQ1) show that, even without migration, exploits achieve 83.0% recall and 99.3% precision in identifying affected versions in Java, outperforming most widely used vulnerability databases and assessment tools. Notably, this capability enables us to contribute 796 confirmed missing affected versions to the CPE dictionary. We investigate the remaining exploit failures (RQ2) and find that they mainly stem from compatibility issues introduced by library evolution and changing environmental constraints. Based on these observations, we manually migrate exploits for 1,885 versions and distill a taxonomy of 10 strategies from these successful adaptation cases (RQ3), thereby increasing the overall recall to 96.1%.
59.9SEMar 23
Verify Implementation Equivalence of Large ModelsQi Zhan, Xing Hu, Xin Xia et al.
Verifying whether two implementations of the same large model are equivalent across frameworks is difficult in practice. Even when they realize the same computation, their graphs may differ substantially in operator decomposition, tensor layout, and the use of fused or opaque kernels, making manual rewrite rules hard to build and maintain. We present Emerge, a framework for checking Implementation Equivalence over computation graphs of large-model implementations. Instead of writing rules manually, Emerge represents the two implementations in an e-graph, infers candidate relations from execution values, and synthesizes rewrite rules on demand when existing rules are insufficient. Each synthesized rule is validated using the strongest applicable method, including SMT- based checking for symbolically tractable cases and constraint-aware randomized testing for opaque kernels, and then propagated through e-graph rebuilding to establish larger equivalences. Our current implementation targets inference computation graphs captured from HuggingFace Transformers and vLLM. Our evaluation shows that Emerge establishes equivalence for correct implementation pairs at practical cost, while also providing useful by-products for debugging: it detects 10 of 13 known implementation bugs and uncovers 8 previously unknown implementation issues that were later confirmed by developers. In addition, Emerge synthesizes block-level rules that compare favorably with manually authored ones.
CLJan 25Code
Assessment of Generative Named Entity Recognition in the Era of Large Language ModelsQi Zhan, Yile Wang, Hui Huang
Named entity recognition (NER) is evolving from a sequence labeling task into a generative paradigm with the rise of large language models (LLMs). We conduct a systematic evaluation of open-source LLMs on both flat and nested NER tasks. We investigate several research questions including the performance gap between generative NER and traditional NER models, the impact of output formats, whether LLMs rely on memorization, and the preservation of general capabilities after fine-tuning. Through experiments across eight LLMs of varying scales and four standard NER datasets, we find that: (1) With parameter-efficient fine-tuning and structured formats like inline bracketed or XML, open-source LLMs achieve performance competitive with traditional encoder-based models and surpass closed-source LLMs like GPT-3; (2) The NER capability of LLMs stems from instruction-following and generative power, not mere memorization of entity-label pairs; and (3) Applying NER instruction tuning has minimal impact on general capabilities of LLMs, even improving performance on datasets like DROP due to enhanced entity understanding. These findings demonstrate that generative NER with LLMs is a promising, user-friendly alternative to traditional methods. We release the data and code at https://github.com/szu-tera/LLMs4NER.
LGDec 30, 2024
Verified Lifting of Deep learning OperatorsQi Zhan, Xing Hu, Xin Xia et al.
Deep learning operators are fundamental components of modern deep learning frameworks. With the growing demand for customized operators, it has become increasingly common for developers to create their own. However, designing and implementing operators is complex and error-prone, due to hardware-specific optimizations and the need for numerical stability. There is a pressing need for tools that can summarize the functionality of both existing and user-defined operators. To address this gap, this work introduces a novel framework for the verified lifting of deep learning operators, which synthesizes high-level mathematical formulas from low-level implementations. Our approach combines symbolic execution, syntax-guided synthesis, and SMT-based verification to produce readable and formally verified mathematical formulas. In synthesis, we employ a combination of top-down and bottom-up strategies to explore the vast search space efficiently; In verification, we design invariant synthesis patterns and leverage SMT solvers to validate the correctness of the derived summaries; In simplification, we use egraph-based techniques with custom rules to restore complex formulas to their natural, intuitive forms. Evaluated on a dataset of deep learning operators implemented in Triton from the real world, our method demonstrates the effectiveness of synthesis and verification compared to existing techniques. This framework bridges the gap between low-level implementations and high-level abstractions, improving understanding and reliability in deep learning operator development.
CVNov 7, 2019
Analysis of CNN-based remote-PPG to understand limitations and sensitivitiesQi Zhan, Wenjin Wang, Gerard de Haan
Deep learning based on Convolutional Neural Network (CNN) has shown promising results in various vision-based applications, recently also in camera-based vital signs monitoring. The CNN-based Photoplethysmography (PPG) extraction has, so far, been focused on performance rather than understanding. In this paper, we try to answer four questions with experiments aiming at improving our understanding of this methodology as it gains popularity. We conclude that the network exploits the blood absorption variation to extract the physiological signals, and that the choice and parameters (phase, spectral content, etc.) of the reference-signal may be more critical than anticipated. The availability of multiple convolutional kernels is necessary for CNN to arrive at a flexible channel combination through the spatial operation, but may not provide the same motion-robustness as a multi-site measurement using knowledge-based PPG extraction. Finally, we conclude that the PPG-related prior knowledge is still helpful for the CNN-based PPG extraction. Consequently, we recommend further investigation of hybrid CNN-based methods to include prior knowledge in their design.