Xiyuan Chen

CR
h-index6
7papers
33citations
Novelty59%
AI Score48

7 Papers

CVJun 18, 2023
Rapid Image Labeling via Neuro-Symbolic Learning

Yifeng Wang, Zhi Tu, Yiwen Xiang et al. · cmu

The success of Computer Vision (CV) relies heavily on manually annotated data. However, it is prohibitively expensive to annotate images in key domains such as healthcare, where data labeling requires significant domain expertise and cannot be easily delegated to crowd workers. To address this challenge, we propose a neuro-symbolic approach called Rapid, which infers image labeling rules from a small amount of labeled data provided by domain experts and automatically labels unannotated data using the rules. Specifically, Rapid combines pre-trained CV models and inductive logic learning to infer the logic-based labeling rules. Rapid achieves a labeling accuracy of 83.33% to 88.33% on four image labeling tasks with only 12 to 39 labeled samples. In particular, Rapid significantly outperforms finetuned CV models in two highly specialized tasks. These results demonstrate the effectiveness of Rapid in learning from small data and its capability to generalize among different tasks. Code and our dataset are publicly available at https://github.com/Neural-Symbolic-Image-Labeling/

NEMay 1, 2022
DDDM: a Brain-Inspired Framework for Robust Classification

Xiyuan Chen, Xingyu Li, Yi Zhou et al.

Despite their outstanding performance in a broad spectrum of real-world tasks, deep artificial neural networks are sensitive to input noises, particularly adversarial perturbations. On the contrary, human and animal brains are much less vulnerable. In contrast to the one-shot inference performed by most deep neural networks, the brain often solves decision-making with an evidence accumulation mechanism that may trade time for accuracy when facing noisy inputs. The mechanism is well described by the Drift-Diffusion Model (DDM). In the DDM, decision-making is modeled as a process in which noisy evidence is accumulated toward a threshold. Drawing inspiration from the DDM, we propose the Dropout-based Drift-Diffusion Model (DDDM) that combines test-phase dropout and the DDM for improving the robustness for arbitrary neural networks. The dropouts create temporally uncorrelated noises in the network that counter perturbations, while the evidence accumulation mechanism guarantees a reasonable decision accuracy. Neural networks enhanced with the DDDM tested in image, speech, and text classification tasks all significantly outperform their native counterparts, demonstrating the DDDM as a task-agnostic defense against adversarial attacks.

CRAug 18, 2025Code
VerilogLAVD: LLM-Aided Rule Generation for Vulnerability Detection in Verilog

Xiang Long, Yingjie Xia, Xiyuan Chen et al.

Timely detection of hardware vulnerabilities during the early design stage is critical for reducing remediation costs. Existing early detection techniques often require specialized security expertise, limiting their usability. Recent efforts have explored the use of large language models (LLMs) for Verilog vulnerability detection. However, LLMs struggle to capture the structure in Verilog code, resulting in inconsistent detection results. To this end, we propose VerilogLAVD, the first LLM-aided graph traversal rule generation approach for Verilog vulnerability detection. Our approach introduces the Verilog Property Graph (VeriPG), a unified representation of Verilog code. It combines syntactic features extracted from the abstract syntax tree (AST) with semantic information derived from control flow and data dependency graphs. We leverage LLMs to generate VeriPG-based detection rules from Common Weakness Enumeration (CWE) descriptions. These rules guide the rule executor that traversal VeriPG for potential vulnerabilities. To evaluate VerilogLAVD, we build a dataset collected from open-source repositories and synthesized data. In our empirical evaluation on 77 Verilog designs encompassing 12 CWE types, VerilogLAVD achieves an F1-score of 0.54. Compared to the LLM-only and LLM with external knowledge baselines, VerilogLAVD improves F1-score by 0.31 and 0.27, respectively.

ARAug 18, 2025Code
ViTAD: Timing Violation-Aware Debugging of RTL Code using Large Language Models

Wenhao Lv, Yingjie Xia, Xiyuan Chen et al.

In modern Very Large Scale Integrated (VLSI) circuit design flow, the Register-Transfer Level (RTL) stage presents a critical opportunity for timing optimization. Addressing timing violations at this early stage is essential, as modern systems demand higher speeds, where even minor timing violations can lead to functional failures or system crashes. However, traditional timing optimization heavily relies on manual expertise, requiring engineers to iteratively analyze timing reports and debug. To automate this process, this paper proposes ViTAD, a method that efficiently analyzes the root causes of timing violations and dynamically generates targeted repair strategies. Specifically, we first parse Verilog code and timing reports to construct a Signal Timing Dependency Graph (STDG). Based on the STDG, we perform violation path analysis and use large language models (LLMs) to infer the root causes of violations. Finally, by analyzing the causes of violations, we selectively retrieve relevant debugging knowledge from a domain-specific knowledge base to generate customized repair solutions. To evaluate the effectiveness of our method, we construct a timing violation dataset based on real-world open-source projects. This dataset contains 54 cases of violations. Experimental results show that our method achieves a 73.68% success rate in repairing timing violations, while the baseline using only LLM is 54.38%. Our method improves the success rate by 19.30%.

CRAug 18, 2025
SecFSM: Knowledge Graph-Guided Verilog Code Generation for Secure Finite State Machines in Systems-on-Chip

Ziteng Hu, Yingjie Xia, Xiyuan Chen et al.

Finite State Machines (FSMs) play a critical role in implementing control logic for Systems-on-Chip (SoC). Traditionally, FSMs are implemented by hardware engineers through Verilog coding, which is often tedious and time-consuming. Recently, with the remarkable progress of Large Language Models (LLMs) in code generation, LLMs have been increasingly explored for automating Verilog code generation. However, LLM-generated Verilog code often suffers from security vulnerabilities, which is particularly concerning for security-sensitive FSM implementations. To address this issue, we propose SecFSM, a novel method that leverages a security-oriented knowledge graph to guide LLMs in generating more secure Verilog code. Specifically, we first construct a FSM Security Knowledge Graph (FSKG) as an external aid to LLMs. Subsequently, we analyze users' requirements to identify vulnerabilities and get a list of vulnerabilities in the requirements. Then, we retrieve knowledge from FSKG based on the vulnerabilities list. Finally, we construct security prompts based on the security knowledge for Verilog code generation. To evaluate SecFSM, we build a dedicated dataset collected from academic datasets, artificial datasets, papers, and industrial cases. Extensive experiments demonstrate that SecFSM outperforms state-of-the-art baselines. In particular, on a benchmark of 25 security test cases evaluated by DeepSeek-R1, SecFSM achieves an outstanding pass rate of 21/25.

CLJul 17, 2025
Making Language Model a Hierarchical Classifier

Yihong Wang, Zhonglin Jiang, Ningyuan Xi et al.

Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts simultaneously. Due to limited time and computationally resources, we choose to adapt a pretrained language model into this form of hierarchical decoder. Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. By thorough experiments, we validate that these selective intermediate layers could be adapted to speak meaningful and reasonable contents, and this paradigm of hierarchical decoder can obtain state-of-the-art performances on multiple tasks such as hierarchical text classification, classification-guided generation, and hierarchical text generation. HdLM outperforms all baselines on WoS, DBpedia, ESconv, EmpatheticDialogues, and several cognitive tests. We also provide thorough theoretical analysis to validate the convergence and computational savings of our methodology. This study suggests the possibility of a generalized hierarchical reasoner, pretraining from scratch.

SIMay 10, 2020
CONE-Align: Consistent Network Alignment with Proximity-Preserving Node Embedding

Xiyuan Chen, Mark Heimann, Fatemeh Vahedian et al.

Network alignment, the process of finding correspondences between nodes in different graphs, has many scientific and industrial applications. Existing unsupervised network alignment methods find suboptimal alignments that break up node neighborhoods, i.e. do not preserve matched neighborhood consistency. To improve this, we propose CONE-Align, which models intra-network proximity with node embeddings and uses them to match nodes across networks after aligning the embedding subspaces. Experiments on diverse, challenging datasets show that CONE-Align is robust and obtains 19.25% greater accuracy on average than the best-performing state-of-the-art graph alignment algorithm in highly noisy settings.