CLApr 19Code
Beyond Overlap Metrics: Rewarding Reasoning and Preferences for Faithful Multi-Role Dialogue SummarizationXiaoyong Mei, Tingting Zuo, Da Chen et al.
Multi-role dialogue summarization requires modeling complex interactions among multiple speakers while preserving role-specific information and factual consistency. However, most existing methods optimize for automatic metrics such as ROUGE and BERTScore, which favor surface-level imitation of references rather than genuine gains in faithfulness or alignment with human preferences. We propose a novel framework that couples explicit cognitive-style reasoning with reward-based optimization for multi-role dialogue summarization. Our method first distills structured reasoning traces (e.g., step-by-step inferences and intermediate reflections) from a large teacher model and uses them as auxiliary supervision to initialize a reasoning-aware summarizer via staged supervised fine-tuning. It then applies GRPO with a dual-principle reward that blends metric-based signals with human-aligned criteria targeting key information coverage, implicit inference, factual faithfulness, and conciseness. Experiments on multilingual multi-role dialogue benchmarks show that our method matches strong baselines on ROUGE and BERTScore. Specifically, results on CSDS confirm the framework's stability in semantic consistency, while in-depth analysis on SAMSum demonstrates clear gains in factual faithfulness and model-based preference alignment. These findings underscore the value of reasoning-aware and preference-aware training for reliable dialogue summarization. Checkpoints and datasets are available at https://huggingface.co/collections/NebulaPixel/summorchestra-multirole-summary.
ARApr 24
AutoINV: Automated Invariant Generation Framework for Formal Verification on High-Level Synthesis DesignsXiaofeng Zhou, Linfeng Du, Guangyu Hu et al.
High-level synthesis (HLS) transforms an algorithmic description of hardware from a higher abstraction (e.g., C/C++) into a register-transfer level (RTL) design, offering reduced development time and greater flexibility in design space exploration. However, such machine-generated RTL designs may contain major functional bugs or security vulnerabilities due to limitations or errors in the HLS tools. One of the most reliable methods to identify these vulnerabilities is formal verification, particularly model checking. Nevertheless, the large size of the generated RTL often causes model checking to struggle to conclude within reasonable time or resource limits. In this study, we propose utilizing the high-level design features from the HLS flow to construct a set of helper assertions aimed at guiding the model checker and accelerating the verification process. To identify the most effective set of helpers to assist the model checker, we develop a proving mechanism that iteratively reuses proving information to select the potentially most useful set of helpers. We evaluate the proposed framework on a set of HLS design benchmarks. Experimental results demonstrate that, when compared to vanilla model checking, our approach achieves a speedup of up to 6.05x, and 2.23x on average.
ARApr 18
SegSEM: Enabling and Enhancing SAM2 for SEM Contour ExtractionDa Chen, Guangyu Hu, Kaihong Xu et al.
Extracting high-fidelity 2D contours from Scanning Electron Microscope (SEM) images is critical for calibrating Optical Proximity Correction (OPC) models. While foundation models like Segment Anything 2 (SAM2) are promising, adapting them to specialized domains with scarce annotated data is a major challenge. This paper presents a case study on adapting SAM2 for SEM contour extraction in a few-shot setting. We propose SegSEM, a framework built on two principles: a data-efficient fine-tuning strategy that adapts by selectively training only the model's encoders, and a robust hybrid architecture integrating a traditional algorithm as a confidence-aware fallback. Using a small dataset of 60 production images, our experiments demonstrate this methodology's viability. The primary contribution is a methodology for leveraging foundation models in data-constrained industrial applications.
LOMay 9
Inverter Redistribution through Self-Dual and Self-Anti-Dual Function TransformationJingren Wang, Guangyu Hu, Shiju Lin et al.
And-Inverter Graph (AIG)-based logic synthesis has been a cornerstone of digital design automation for several decades. While numerous optimization techniques have been developed for both technology-independent and technology-dependent synthesis stages, existing technology mapping approaches predominantly employ graph-covering strategies directly on AIG representations without adequately addressing complemented edge distribution. Neglecting inverters creates a significant disconnect: complemented edges are systematically overlooked in technology-independent cost functions, yet they abruptly become critical during technology-dependent mapping. In this work, we introduce a delay-driven pre-processing stage that operates prior to technology mapping, designed to strategically redistribute complemented edges and mitigate the inverter-induced costs on critical paths. Experimental validation demonstrates that our delay-targeted methodology not only preserves original delay characteristics but also enables performance improvements. Notably, arithmetic logic in the EPFL combinational benchmark exhibits particular sensitivity to this approach, with our method achieving an average delay reduction of 0.49% and a maximum improvement of 3.86% on the case sqrt.
ARMar 26
AutoPDR: Circuit-Aware Solver Configuration Prediction for Hardware Model CheckingGuangyu Hu, Chen Chen, Xiaofeng Zhou et al.
Property Directed Reachability (PDR) is a powerful algorithm for formal verification of hardware and software systems, but its performance is highly sensitive to parameter configurations. Manual parameter tuning is time-consuming and requires domain expertise, while traditional automated parameter tuning frameworks are not well-suited for time-sensitive verification tasks like PDR. This paper presents a circuit-aware solver configuration framework that employs graph learning for intelligent heuristic selection in PDR-based verification. Our approach combines graph representations with static circuit features to predict optimal PDR solving configurations for specific circuits. We incorporate expert prior knowledge through constraint-based parameter filtering to eliminate invalid and inefficient configurations and reduce 78% search space. Our feature extraction pipeline captures structural, functional, and connectivity characteristics of circuit topology and component patterns. Experimental evaluation on a comprehensive benchmark suite demonstrates significant performance improvements compared to default configurations and commonly-used settings. The system successfully identifies circuit-specific parameter patterns and automatically selects the most suitable solving strategies based on circuit characteristics, making it a practical tool for automated formal verification workflows.
ARFeb 26
EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement LearningGuangyu Hu, Xiaofeng Zhou, Wei Zhang et al.
Progress in hardware model checking depends critically on high-quality benchmarks. However, the community faces a significant benchmark gap: existing suites are limited in number, often distributed only in representations such as BTOR2 without access to the originating register-transfer-level (RTL) designs, and biased toward extreme difficulty where instances are either trivial or intractable. These limitations hinder rigorous evaluation of new verification techniques and encourage overfitting of solver heuristics to a narrow set of problems. To address this, we introduce EvolveGen, a framework for generating hardware model checking benchmarks by combining reinforcement learning (RL) with high-level synthesis (HLS). Our approach operates at an algorithmic level of abstraction in which an RL agent learns to construct computation graphs. By compiling these graphs under different synthesis directives, we produce pairs of functionally equivalent but structurally distinct hardware designs, inducing challenging model checking instances. Solver runtime is used as the reward signal, enabling the agent to autonomously discover and generate small-but-hard instances that expose solver-specific weaknesses. Experiments show that EvolveGen efficiently creates a diverse benchmark set in standard formats (e.g., AIGER and BTOR2) and effectively reveals performance bottlenecks in state-of-the-art model checkers.
ARNov 11, 2025
BDD2Seq: Enabling Scalable Reversible-Circuit Synthesis via Graph-to-Sequence LearningMingkai Miao, Jianheng Tang, Guangyu Hu et al.
Binary Decision Diagrams (BDDs) are instrumental in many electronic design automation (EDA) tasks thanks to their compact representation of Boolean functions. In BDD-based reversible-circuit synthesis, which is critical for quantum computing, the chosen variable ordering governs the number of BDD nodes and thus the key metrics of resource consumption, such as Quantum Cost. Because finding an optimal variable ordering for BDDs is an NP-complete problem, existing heuristics often degrade as circuit complexity grows. We introduce BDD2Seq, a graph-to-sequence framework that couples a Graph Neural Network encoder with a Pointer-Network decoder and Diverse Beam Search to predict high-quality orderings. By treating the circuit netlist as a graph, BDD2Seq learns structural dependencies that conventional heuristics overlooked, yielding smaller BDDs and faster synthesis. Extensive experiments on three public benchmarks show that BDD2Seq achieves around 1.4 times lower Quantum Cost and 3.7 times faster synthesis than modern heuristic algorithms. To the best of our knowledge, this is the first work to tackle the variable-ordering problem in BDD-based reversible-circuit synthesis with a graph-based generative model and diversity-promoting decoding.
LOApr 23
A-IC3: Learning-Guided Adaptive Inductive Generalization for Hardware Model CheckingXiaofeng Zhou, Guangyu Hu, Hongce Zhang et al.
The IC3 algorithm represents the state-of-the-art (SOTA) hardware model checking technique, owing to its robust performance and scalability. A significant body of research has focused on enhancing the solving efficiency of the IC3 algorithm, with particular attention to the inductive generalization process: a critical phase wherein the algorithm seeks to generalize a counterexample to inductiveness (CTI), which typically is a state leading to a bad state, into a broader set of states. This inductive generalization is a primary source of clauses in IC3 and thus plays a pivotal role in determining the overall effectiveness of the algorithm. Despite its importance, existing approaches often rely on fixed inductive generalization strategies, overlooking the dynamic and context-sensitive nature of the verification environment in which spurious counterexamples arise. This rigidity can limit the quality of generated clauses and, consequently, the performance of IC3. To address this limitation, we propose a lightweight machine-learning-based framework that dynamically selects appropriate inductive generalization strategies in response to the evolving verification context. Specifically, we employ a multi-armed bandit (MAB) algorithm to adaptively choose inductive generalization strategies based on real-time feedback from the verification process. The agent is updated by evaluating the quality of generalization outcomes, thereby refining its strategy selection over time. Empirical evaluation on a benchmark suite comprising 914 instances, primarily drawn from the latest HWMCC collection, demonstrates the efficacy of our approach. When implemented on the state-of-the-art model checker rIC3, our method solves 26 to 50 more cases than the baselines and improves the PAR-2 score by 194.72 to 389.29.
AIJan 18
IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model CheckingMingkai Miao, Guangyu Hu, Ziyi Yang et al.
IC3, also known as property-directed reachability (PDR), is a commonly-used algorithm for hardware safety model checking. It checks if a state transition system complies with a given safety property. IC3 either returns UNSAFE (indicating property violation) with a counterexample trace, or SAFE with a checkable inductive invariant as the proof to safety. In practice, the performance of IC3 is dominated by a large web of interacting heuristics and implementation choices, making manual tuning costly, brittle, and hard to reproduce. This paper presents IC3-Evolve, an automated offline code-evolution framework that utilizes an LLM to propose small, slot-restricted and auditable patches to an IC3 implementation. Crucially, every candidate patch is admitted only through proof- /witness-gated validation: SAFE runs must emit a certificate that is independently checked, and UNSAFE runs must emit a replayable counterexample trace, preventing unsound edits from being deployed. Since the LLM is used only offline, the deployed artifact is a standalone evolved checker with zero ML/LLM inference overhead and no runtime model dependency. We evolve on the public hardware model checking competition (HWMCC) benchmark and evaluate the generalizability on unseen public and industrial model checking benchmarks, showing that IC3-Evolve can reliably discover practical heuristic improvements under strict correctness gates.