RuiZhe Jiang

CR
h-index9
4papers
35citations
Novelty60%
AI Score41

4 Papers

CRDec 8, 2025Code
VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection

Yuzhou Nie, Hongwei Li, Chengquan Guo et al.

We propose VulnLLM-R, the~\emph{first specialized reasoning LLM} for vulnerability detection. Our key insight is that LLMs can reason about program states and analyze the potential vulnerabilities, rather than simple pattern matching. This can improve the model's generalizability and prevent learning shortcuts. However, SOTA reasoning LLMs are typically ultra-large, closed-source, or have limited performance in vulnerability detection. To address this, we propose a novel training recipe with specialized data selection, reasoning data generation, reasoning data filtering and correction, and testing-phase optimization. Using our proposed methodology, we train a reasoning model with seven billion parameters. Through extensive experiments on SOTA datasets across Python, C/C++, and Java, we show that VulnLLM-R has superior effectiveness and efficiency than SOTA static analysis tools and both open-source and commercial large reasoning models. We further conduct a detailed ablation study to validate the key designs in our training recipe. Finally, we construct an agent scaffold around our model and show that it outperforms CodeQL and AFL++ in real-world projects. Our agent further discovers a set of zero-day vulnerabilities in actively maintained repositories. This work represents a pioneering effort to enable real-world, project-level vulnerability detection using AI agents powered by specialized reasoning models. The code is available at~\href{https://github.com/ucsb-mlsec/VulnLLM-R}{github}.

CVJul 1, 2024
Investigating the Segment Anything Foundation Model for Mapping Smallholder Agriculture Field Boundaries Without Training Labels

Pratyush Tripathy, Kathy Baylis, Kyle Wu et al.

Accurate mapping of agricultural field boundaries is crucial for enhancing outcomes like precision agriculture, crop monitoring, and yield estimation. However, extracting these boundaries from satellite images is challenging, especially for smallholder farms and data-scarce environments. This study explores the Segment Anything Model (SAM) to delineate agricultural field boundaries in Bihar, India, using 2-meter resolution SkySat imagery without additional training. We evaluate SAM's performance across three model checkpoints, various input sizes, multi-date satellite images, and edge-enhanced imagery. Our results show that SAM correctly identifies about 58% of field boundaries, comparable to other approaches requiring extensive training data. Using different input image sizes improves accuracy, with the most significant improvement observed when using multi-date satellite images. This work establishes proof of concept for using SAM and maximizing its potential in agricultural field boundary mapping. Our work highlights SAM's potential in delineating agriculture field boundary in training-data scarce settings to enable a wide range of agriculture related analysis.

CROct 14, 2024
SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

Yuzhou Nie, Zhun Wang, Yu Yang et al.

Existing benchmarks for evaluating the security risks and capabilities (e.g., vulnerability detection) of code-generating large language models (LLMs) face several key limitations: (1) limited coverage of risk and capabilities; (2) reliance on static evaluation metrics such as LLM judgments or rule-based detection, which lack the precision of dynamic analysis; and (3) a trade-off between data quality and benchmark scale. To address these challenges, we introduce a general and scalable benchmark construction framework that begins with manually validated, high-quality seed examples and expands them via targeted mutations. Our approach provides a comprehensive suite of artifacts so the benchmark can support comprehensive risk assessment and security capability evaluation using dynamic metrics. By combining expert insights with automated generation, we strike a balance between manual effort, data quality, and benchmark scale. Applying this framework to Python, C/C++, and Java, we build SeCodePLT, a dataset of more than 5.9k samples spanning 44 CWE-based risk categories and three security capabilities. Compared with state-of-the-art benchmarks, SeCodePLT offers broader coverage, higher data fidelity, and substantially greater scale. We use SeCodePLT to evaluate leading code LLMs and agents, revealing their strengths and weaknesses in both generating secure code and identifying or fixing vulnerabilities.

LGDec 2, 2024
ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization

RuiZhe Jiang, Haotian Lei

Over-parameterized neural network models often lead to significant performance discrepancies between training and test sets, a phenomenon known as overfitting. To address this, researchers have proposed numerous regularization techniques tailored to various tasks and model architectures. In this paper, we introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets. Based on this viewpoint, we propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set. Due to minimal prior assumptions, this approach is applicable to almost any architecture and task. Our experiments show that it effectively reduces overfitting, with low sensitivity to hyperparameters and minimal computational cost. It demonstrates particularly strong memory suppression and promotes normal convergence, even when the model has already started to overfit. Even in the absence of significant overfitting, our method consistently improves accuracy and reduces validation loss.