Lihao Zhang

h-index4

10papers

67citations

Novelty60%

AI Score53

Ranked #32,700 of 201,326 authors (top 16%)#47 in DB (top 9%)

10 Papers

ARMay 9Code

VeriRAG: A Retrieval-Augmented Framework for Automated RTL Testability Repair

Haomin Qi, Yuyang Du, Lihao Zhang et al.

Large language models (LLMs) have demonstrated immense potential in computer-aided design (CAD), particularly for automated debugging and verification within electronic design automation (EDA) tools. However, Design for Testability (DFT) remains a relatively underexplored area. This paper presents VeriRAG, the first LLM-assisted DFT-EDA framework. VeriRAG leverages a Retrieval-Augmented Generation (RAG) approach to enable LLM to revise code to ensure DFT compliance. VeriRAG integrates (1) an autoencoder-based similarity measurement model for precise retrieval of reference RTL designs for the LLM, and (2) an iterative code revision pipeline that allows the LLM to ensure DFT compliance while maintaining synthesizability. To support VeriRAG, we introduce VeriDFT, a Verilog-based DFT dataset curated for DFT-aware RTL repairs. VeriRAG retrieves structurally similar RTL designs from VeriDFT, each paired with a rigorously validated correction, as references for code repair. With VeriRAG and VeriDFT, we achieve fully automated DFT correction -- resulting in a 7.72-fold improvement in successful repair rate compared to the zero-shot baseline (Fig. 5 in Section V). Ablation studies further confirm the contribution of each component of the VeriRAG framework. We open-source our data, models, and scripts at https://github.com/HarminChee/VeriRAG.

DBNov 4, 2022

The Tensor Data Platform: Towards an AI-centric Database System

Apurva Gandhi, Yuki Asada, Victor Fu et al. · microsoft-research

Database engines have historically absorbed many of the innovations in data processing, adding features to process graph data, XML, object oriented, and text among many others. In this paper, we make the case that it is time to do the same for AI -- but with a twist! While existing approaches have tried to achieve this by integrating databases with external ML tools, in this paper we claim that achieving a truly AI-centric database requires moving the DBMS engine, at its core, from a relational to a tensor abstraction. This allows us to: (1) support multi-modal data processing such as images, videos, audio, text as well as relational; (2) leverage the wellspring of innovation in HW and runtimes for tensor computation; and (3) exploit automatic differentiation to enable a novel class of "trainable" queries that can learn to perform a task. To support the above scenarios, we introduce TDP: a system that builds upon our prior work mapping relational queries to tensors. Thanks to a tighter integration with the tensor runtime, TDP is able to provide a broader coverage of new emerging scenarios requiring access to multi-modal data and automatic differentiation.

DBSep 10, 2022

Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem

Yuki Asada, Victor Fu, Apurva Gandhi et al. · microsoft-research, uw

We demonstrate Tensor Query Processor (TQP): a query processor that automatically compiles relational operators into tensor programs. By leveraging tensor runtimes such as PyTorch, TQP is able to: (1) integrate with ML tools (e.g., Pandas for data ingestion, Tensorboard for visualization); (2) target different hardware (e.g., CPU, GPU) and software (e.g., browser) backends; and (3) end-to-end accelerate queries containing both relational and ML operators. TQP is generic enough to support the TPC-H benchmark, and it provides performance that is comparable to, and often better than, that of specialized CPU and GPU query processors.

ITMar 27

CL-SEC: Cross-Layer Semantic Error Correction Empowered by Language Models

Yirun Wang, Yuyang Du, Soung Chang Liew et al.

Achieving reliable communication has long been a fundamental challenge in networked systems. Semantic Error Correction (SEC) leverages the semantic understanding capabilities of language models (LMs) to perform application-layer error correction, complementing conventional channel decoding. While promising, existing SEC approaches rely solely on context captured by LMs at the application layer, ignoring the rich information available at the physical layer. To address this limitation, this paper introduces Cross-Layer SEC (CL-SEC), an LM-empowered error correction framework that integrates cross-layer information from both the physical and application layers to jointly correct corrupted words in text communication. Using a Bayesian combination in product form tailored to this framework, CL-SEC achieves significantly improved performance over methods that process information in isolated layers. CL-SEC shows substantial gains across multiple error-correction metrics, including bit-error rate, word-error rate, and semantic fidelity scores. Importantly, unlike most semantic communication systems that focus solely on recovering the semantic meaning of transmitted messages, CL-SEC aims to reconstruct the original transmitted message verbatim, leveraging the semantic understanding capabilities of LMs for precise reconstruction.

NIApr 9

Real-Time Cross-Layer Semantic Error Correction Using Language Models and Software-Defined Radio

Yuchen Pan, Yuyang Du, Yirun Wang et al.

As Language Models (LMs) advance, Semantic Error Correction (SEC) has emerged as a promising approach for reliable network designs. Yet existing methods prioritize intent over accuracy, falling short of verbatim recovery. Our recent work, Cross-Layer SEC (CL-SEC), addressed this by fusing physical-layer Log-Likelihood Ratios (LLRs) with semantic context, but its real-time feasibility remained unvalidated. This paper demonstrates CL-SEC on a live Software-Defined Radio (SDR) testbed, resolving implementation barriers with: 1) an SDR middleware enabling real-time LLR extraction from FPGA hardware, and 2) a generalized inference interface supporting modern encoder-decoder LMs. Real-world experiments confirm that the cross-layer fusion significantly outperforms either source alone.

CVAug 23, 2025

RF-PGS: Fully-structured Spatial Wireless Channel Representation with Planar Gaussian Splatting

Lihao Zhang, Zongtan Li, Haijian Sun

In the 6G era, the demand for higher system throughput and the implementation of emerging 6G technologies require large-scale antenna arrays and accurate spatial channel state information (Spatial-CSI). Traditional channel modeling approaches, such as empirical models, ray tracing, and measurement-based methods, face challenges in spatial resolution, efficiency, and scalability. Radiance field-based methods have emerged as promising alternatives but still suffer from geometric inaccuracy and costly supervision. This paper proposes RF-PGS, a novel framework that reconstructs high-fidelity radio propagation paths from only sparse path loss spectra. By introducing Planar Gaussians as geometry primitives with certain RF-specific optimizations, RF-PGS achieves dense, surface-aligned scene reconstruction in the first geometry training stage. In the subsequent Radio Frequency (RF) training stage, the proposed fully-structured radio radiance, combined with a tailored multi-view loss, accurately models radio propagation behavior. Compared to prior radiance field methods, RF-PGS significantly improves reconstruction accuracy, reduces training costs, and enables efficient representation of wireless channels, offering a practical solution for scalable 6G Spatial-CSI modeling.

SPMay 6, 2025

Terahertz Spatial Wireless Channel Modeling with Radio Radiance Field

John Song, Lihao Zhang, Feng Ye et al.

Terahertz (THz) communication is a key enabler for 6G systems, offering ultra-wide bandwidth and unprecedented data rates. However, THz signal propagation differs significantly from lower-frequency bands due to severe free space path loss, minimal diffraction and specular reflection, and prominent scattering, making conventional channel modeling and pilot-based estimation approaches inefficient. In this work, we investigate the feasibility of applying radio radiance field (RRF) framework to the THz band. This method reconstructs a continuous RRF using visual-based geometry and sparse THz RF measurements, enabling efficient spatial channel state information (Spatial-CSI) modeling without dense sampling. We first build a fine simulated THz scenario, then we reconstruct the RRF and evaluate the performance in terms of both reconstruction quality and effectiveness in THz communication, showing that the reconstructed RRF captures key propagation paths with sparse training samples. Our findings demonstrate that RRF modeling remains effective in the THz regime and provides a promising direction for scalable, low-cost spatial channel reconstruction in future 6G networks.

ITJan 16, 2024

Spatial Channel State Information Prediction with Generative AI: Towards Holographic Communication and Digital Radio Twin

Lihao Zhang, Haijian Sun, Yong Zeng et al.

As 5G technology becomes increasingly established, the anticipation for 6G is growing, which promises to deliver faster and more reliable wireless connections via cutting-edge radio technologies. However, efficient management method of the large-scale antenna arrays deployed by those radio technologies is crucial. Traditional management methods are mainly reactive, usually based on feedback from users to adapt to the dynamic wireless channel. However, a more promising approach lies in the prediction of spatial channel state information (spatial-CSI), which is an all-inclusive channel characterization and consists of all the feasible line-of-sight (LoS) and non-line-of-sight (NLoS) paths between the transmitter (Tx) and receiver (Rx), with the three-dimension (3D) trajectory, attenuation, phase shift, delay, and polarization of each path. Advances in hardware and neural networks make it possible to predict such spatial-CSI using precise environmental information, and further look into the possibility of holographic communication, which implies complete control over every aspect of the radio waves emitted. Based on the integration of holographic communication and digital twin, we proposed a new framework, digital radio twin, which takes advantages from both the digital world and deterministic control over radio waves, supporting a wide range of high-level applications. As a preliminary attempt towards this visionary direction, in this paper, we explore the use of generative artificial intelligence (AI) to pinpoint the valid paths in a given environment, demonstrating promising results, and highlighting the potential of this approach in driving forward the evolution of 6G wireless communication technologies.

CYJul 19, 2021

Independent Ethical Assessment of Text Classification Models: A Hate Speech Detection Case Study

Amitoj Singh, Jingshu Chen, Lihao Zhang et al.

An independent ethical assessment of an artificial intelligence system is an impartial examination of the system's development, deployment, and use in alignment with ethical values. System-level qualitative frameworks that describe high-level requirements and component-level quantitative metrics that measure individual ethical dimensions have been developed over the past few years. However, there exists a gap between the two, which hinders the execution of independent ethical assessments in practice. This study bridges this gap and designs a holistic independent ethical assessment process for a text classification model with a special focus on the task of hate speech detection. The assessment is further augmented with protected attributes mining and counterfactual-based analysis to enhance bias assessment. It covers assessments of technical performance, data bias, embedding bias, classification bias, and interpretability. The proposed process is demonstrated through an assessment of a deep hate speech detection model.

NIJan 2, 2021

Speeding up Block Propagation in Blockchain Network: Uncoded and Coded Designs

Lihao Zhang, Taotao Wang, Soung Chang Liew

We design and validate new block propagation protocols for the peer-to-peer (P2P) network of the Bitcoin blockchain. Despite its strong protection for security and privacy, the current Bitcoin blockchain can only support a low number of transactions per second (TPS). In this work, we redesign the current Bitcoin's networking protocol to increase TPS without changing vital components in its consensus-building protocol. In particular, we improve the compact-block relaying protocol to enable the propagation of blocks containing a massive number of transactions without inducing extra propagation latencies. Our improvements consist of (i) replacing the existing store-and-forward compact-block relaying scheme with a cut-through compact-block relaying scheme; (ii) exploiting rateless erasure codes for P2P networks to increase block-propagation efficiency. Since our protocols only need to rework the current Bitcoin's networking protocol and does not modify the data structures and crypto-functional components, they can be seamlessly incorporated into the existing Bitcoin blockchain. To validate our designs, we perform analysis on our protocols and implement a Bitcoin network simulator on NS3 to run different block propagation protocols. The analysis and experimental results confirm that our new block propagation protocols could increase the TPS of the Bitcoin blockchain by 100x without compromising security and consensus-building.