ARApr 24
AutoINV: Automated Invariant Generation Framework for Formal Verification on High-Level Synthesis DesignsXiaofeng Zhou, Linfeng Du, Guangyu Hu et al.
High-level synthesis (HLS) transforms an algorithmic description of hardware from a higher abstraction (e.g., C/C++) into a register-transfer level (RTL) design, offering reduced development time and greater flexibility in design space exploration. However, such machine-generated RTL designs may contain major functional bugs or security vulnerabilities due to limitations or errors in the HLS tools. One of the most reliable methods to identify these vulnerabilities is formal verification, particularly model checking. Nevertheless, the large size of the generated RTL often causes model checking to struggle to conclude within reasonable time or resource limits. In this study, we propose utilizing the high-level design features from the HLS flow to construct a set of helper assertions aimed at guiding the model checker and accelerating the verification process. To identify the most effective set of helpers to assist the model checker, we develop a proving mechanism that iteratively reuses proving information to select the potentially most useful set of helpers. We evaluate the proposed framework on a set of HLS design benchmarks. Experimental results demonstrate that, when compared to vanilla model checking, our approach achieves a speedup of up to 6.05x, and 2.23x on average.
ARMar 26
AutoPDR: Circuit-Aware Solver Configuration Prediction for Hardware Model CheckingGuangyu Hu, Chen Chen, Xiaofeng Zhou et al.
Property Directed Reachability (PDR) is a powerful algorithm for formal verification of hardware and software systems, but its performance is highly sensitive to parameter configurations. Manual parameter tuning is time-consuming and requires domain expertise, while traditional automated parameter tuning frameworks are not well-suited for time-sensitive verification tasks like PDR. This paper presents a circuit-aware solver configuration framework that employs graph learning for intelligent heuristic selection in PDR-based verification. Our approach combines graph representations with static circuit features to predict optimal PDR solving configurations for specific circuits. We incorporate expert prior knowledge through constraint-based parameter filtering to eliminate invalid and inefficient configurations and reduce 78% search space. Our feature extraction pipeline captures structural, functional, and connectivity characteristics of circuit topology and component patterns. Experimental evaluation on a comprehensive benchmark suite demonstrates significant performance improvements compared to default configurations and commonly-used settings. The system successfully identifies circuit-specific parameter patterns and automatically selects the most suitable solving strategies based on circuit characteristics, making it a practical tool for automated formal verification workflows.
ARFeb 26
EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement LearningGuangyu Hu, Xiaofeng Zhou, Wei Zhang et al.
Progress in hardware model checking depends critically on high-quality benchmarks. However, the community faces a significant benchmark gap: existing suites are limited in number, often distributed only in representations such as BTOR2 without access to the originating register-transfer-level (RTL) designs, and biased toward extreme difficulty where instances are either trivial or intractable. These limitations hinder rigorous evaluation of new verification techniques and encourage overfitting of solver heuristics to a narrow set of problems. To address this, we introduce EvolveGen, a framework for generating hardware model checking benchmarks by combining reinforcement learning (RL) with high-level synthesis (HLS). Our approach operates at an algorithmic level of abstraction in which an RL agent learns to construct computation graphs. By compiling these graphs under different synthesis directives, we produce pairs of functionally equivalent but structurally distinct hardware designs, inducing challenging model checking instances. Solver runtime is used as the reward signal, enabling the agent to autonomously discover and generate small-but-hard instances that expose solver-specific weaknesses. Experiments show that EvolveGen efficiently creates a diverse benchmark set in standard formats (e.g., AIGER and BTOR2) and effectively reveals performance bottlenecks in state-of-the-art model checkers.
LOApr 23
A-IC3: Learning-Guided Adaptive Inductive Generalization for Hardware Model CheckingXiaofeng Zhou, Guangyu Hu, Hongce Zhang et al.
The IC3 algorithm represents the state-of-the-art (SOTA) hardware model checking technique, owing to its robust performance and scalability. A significant body of research has focused on enhancing the solving efficiency of the IC3 algorithm, with particular attention to the inductive generalization process: a critical phase wherein the algorithm seeks to generalize a counterexample to inductiveness (CTI), which typically is a state leading to a bad state, into a broader set of states. This inductive generalization is a primary source of clauses in IC3 and thus plays a pivotal role in determining the overall effectiveness of the algorithm. Despite its importance, existing approaches often rely on fixed inductive generalization strategies, overlooking the dynamic and context-sensitive nature of the verification environment in which spurious counterexamples arise. This rigidity can limit the quality of generated clauses and, consequently, the performance of IC3. To address this limitation, we propose a lightweight machine-learning-based framework that dynamically selects appropriate inductive generalization strategies in response to the evolving verification context. Specifically, we employ a multi-armed bandit (MAB) algorithm to adaptively choose inductive generalization strategies based on real-time feedback from the verification process. The agent is updated by evaluating the quality of generalization outcomes, thereby refining its strategy selection over time. Empirical evaluation on a benchmark suite comprising 914 instances, primarily drawn from the latest HWMCC collection, demonstrates the efficacy of our approach. When implemented on the state-of-the-art model checker rIC3, our method solves 26 to 50 more cases than the baselines and improves the PAR-2 score by 194.72 to 389.29.
CLJun 4, 2025
Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model EnhancementXiaofeng Zhou, Heyan Huang, Lizi Liao
Large Language Models (LLMs) continue to set new standards in knowledge-intensive and complex reasoning tasks, yet their high computational demands limit widespread adoption. While distilling large models into smaller ones offers a sustainable solution, current techniques--such as static knowledge distillation, resource-intensive reinforcement learning from human feedback, or limited self-reflection--struggle to yield substantial and lasting performance gains. In this paper, we present a novel Debate and Reflect (D&R) framework that orchestrates multi-turn debates between smaller models and stronger teacher models, eliciting actionable feedback (e.g., error analysis, corrective strategies) to guide student models. Further, we introduce Tree-structured Direct Preference Optimization (T-DPO) to efficiently leverage these debate logs, organizing interactions into a hierarchical format for effective training. Empirical evaluations across diverse NLP benchmarks demonstrate that our approach significantly improves smaller-model accuracy, robustness, and generalization, outperforming conventional baselines by a large margin.
DBApr 20, 2019
Mining Rules Incrementally over Large Knowledge BasesXiaofeng Zhou, Ali Sadeghian, Daisy Zhe Wang
Multiple web-scale Knowledge Bases, e.g., Freebase, YAGO, NELL, have been constructed using semi-supervised or unsupervised information extraction techniques and many of them, despite their large sizes, are continuously growing. Much research effort has been put into mining inference rules from knowledge bases. To address the task of rule mining over evolving web-scale knowledge bases, we propose a parallel incremental rule mining framework. Our approach is able to efficiently mine rules based on the relational model and apply updates to large knowledge bases; we propose an alternative metric that reduces computation complexity without compromising quality; we apply multiple optimization techniques that reduce runtime by more than 2 orders of magnitude. Experiments show that our approach efficiently scales to web-scale knowledge bases and saves over 90% time compared to the state-of-the-art batch rule mining system. We also apply our optimization techniques to the batch rule mining algorithm, reducing runtime by more than half compared to the state-of-the-art. To the best of our knowledge, our incremental rule mining system is the first that handles updates to web-scale knowledge bases.