Zhen Hu

CL
h-index10
18papers
1,331citations
Novelty26%
AI Score45

18 Papers

LGAug 27, 2022Code
A Comprehensive Review of Digital Twin -- Part 2: Roles of Uncertainty Quantification and Optimization, a Battery Digital Twin, and Perspectives

Adam Thelen, Xiaoge Zhang, Olga Fink et al.

As an emerging technology in the era of Industry 4.0, digital twin is gaining unprecedented attention because of its promise to further optimize process design, quality control, health monitoring, decision and policy making, and more, by comprehensively modeling the physical world as a group of interconnected digital models. In a two-part series of papers, we examine the fundamental role of different modeling techniques, twinning enabling technologies, and uncertainty quantification and optimization methods commonly used in digital twins. This second paper presents a literature review of key enabling technologies of digital twins, with an emphasis on uncertainty quantification, optimization methods, open source datasets and tools, major findings, challenges, and future directions. Discussions focus on current methods of uncertainty quantification and optimization and how they are applied in different dimensions of a digital twin. Additionally, this paper presents a case study where a battery digital twin is constructed and tested to illustrate some of the modeling and twinning methods reviewed in this two-part review. Code and preprocessed data for generating all the results and figures presented in the case study are available on GitHub.

CEAug 26, 2022
A Comprehensive Review of Digital Twin -- Part 1: Modeling and Twinning Enabling Technologies

Adam Thelen, Xiaoge Zhang, Olga Fink et al.

As an emerging technology in the era of Industry 4.0, digital twin is gaining unprecedented attention because of its promise to further optimize process design, quality control, health monitoring, decision and policy making, and more, by comprehensively modeling the physical world as a group of interconnected digital models. In a two-part series of papers, we examine the fundamental role of different modeling techniques, twinning enabling technologies, and uncertainty quantification and optimization methods commonly used in digital twins. This first paper presents a thorough literature review of digital twin trends across many disciplines currently pursuing this area of research. Then, digital twin modeling and twinning enabling technologies are further analyzed by classifying them into two main categories: physical-to-virtual, and virtual-to-physical, based on the direction in which data flows. Finally, this paper provides perspectives on the trajectory of digital twin technology over the next decade, and introduces a few emerging areas of research which will likely be of great use in future digital twin research. In part two of this review, the role of uncertainty quantification and optimization are discussed, a battery digital twin is demonstrated, and more perspectives on the future of digital twin are shared.

ITApr 14, 2016
MIMO UWB Radar System with Compressive Sensing

Xia Li, Zhen Hu, Robert C. Qiu

A multiple input multiple output ultra-wideband cognitive radar based on compressive sensing is presented in this letter. For traditional UWB radar, high sampling rate analog to digital converter at the receiver is required to meet Shannon theorem, which increases hardware complexity. In order to bypass the bottleneck of ADC or further increase the radar bandwidth using the latest wideband ADC, we propose to exploit CS for signal reconstruction at the receiver of UWB radar for the sparse targets in the surveillance area. Besides, the function of narrowband interference cancellation is integrated into the proposed MIMO UWB radar. The field demonstration proves the feasibility and reliability of the proposed algorithm.

CLNov 19, 2024Code
JuniperLiu at CoMeDi Shared Task: Models as Annotators in Lexical Semantics Disagreements

Zhu Liu, Zhen Hu, Ying Liu

We present the results of our system for the CoMeDi Shared Task, which predicts majority votes (Subtask 1) and annotator disagreements (Subtask 2). Our approach combines model ensemble strategies with MLP-based and threshold-based methods trained on pretrained language models. Treating individual models as virtual annotators, we simulate the annotation process by designing aggregation measures that incorporate continuous relatedness scores and discrete classification labels to capture both majority and disagreement. Additionally, we employ anisotropy removal techniques to enhance performance. Experimental results demonstrate the effectiveness of our methods, particularly for Subtask 2. Notably, we find that standard deviation on continuous relatedness scores among different model manipulations correlates with human disagreement annotations compared to metrics on aggregated discrete labels. The code will be published at https://github.com/RyanLiut/CoMeDi_Solution.

LGSep 16, 2025Code
Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use

Yabo Zhang, Yihan Zeng, Qingyun Li et al.

Large language models (LLMs) have demonstrated strong capabilities in language understanding and reasoning, yet they remain limited when tackling real-world tasks that require up-to-date knowledge, precise operations, or specialized tool use. To address this, we propose Tool-R1, a reinforcement learning framework that enables LLMs to perform general, compositional, and multi-step tool use by generating executable Python code. Tool-R1 supports integration of user-defined tools and standard libraries, with variable sharing across steps to construct coherent workflows. An outcome-based reward function, combining LLM-based answer judgment and code execution success, guides policy optimization. To improve training efficiency, we maintain a dynamic sample queue to cache and reuse high-quality trajectories, reducing the overhead of costly online sampling. Experiments on the GAIA benchmark show that Tool-R1 substantially improves both accuracy and robustness, achieving about 10\% gain over strong baselines, with larger improvements on complex multi-step tasks. These results highlight the potential of Tool-R1 for enabling reliable and efficient tool-augmented reasoning in real-world applications. Our code will be available at https://github.com/YBYBZhang/Tool-R1.

CLNov 20, 2019Code
CAIL2019-SCM: A Dataset of Similar Case Matching in Legal Domain

Chaojun Xiao, Haoxi Zhong, Zhipeng Guo et al.

In this paper, we introduce CAIL2019-SCM, Chinese AI and Law 2019 Similar Case Matching dataset. CAIL2019-SCM contains 8,964 triplets of cases published by the Supreme People's Court of China. CAIL2019-SCM focuses on detecting similar cases, and the participants are required to check which two cases are more similar in the triplets. There are 711 teams who participated in this year's competition, and the best team has reached a score of 71.88. We have also implemented several baselines to help researchers better understand this task. The dataset and more details can be found from https://github.com/china-ai-law-challenge/CAIL2019/tree/master/scm.

AIOct 14, 2025
From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model

Boyou Chen, Gerui Xu, Zifei Wang et al.

Vehicle crashes involve complex interactions between road users, split-second decisions, and challenging environmental conditions. Among these, two-vehicle crashes are the most prevalent, accounting for approximately 70% of roadway crashes and posing a significant challenge to traffic safety. Identifying Driver Hazardous Action (DHA) is essential for understanding crash causation, yet the reliability of DHA data in large-scale databases is limited by inconsistent and labor-intensive manual coding practices. Here, we present an innovative framework that leverages a fine-tuned large language model to automatically infer DHAs from textual crash narratives, thereby improving the validity and interpretability of DHA classifications. Using five years of two-vehicle crash data from MTCF, we fine-tuned the Llama 3.2 1B model on detailed crash narratives and benchmarked its performance against conventional machine learning classifiers, including Random Forest, XGBoost, CatBoost, and a neural network. The fine-tuned LLM achieved an overall accuracy of 80%, surpassing all baseline models and demonstrating pronounced improvements in scenarios with imbalanced data. To increase interpretability, we developed a probabilistic reasoning approach, analyzing model output shifts across original test sets and three targeted counterfactual scenarios: variations in driver distraction and age. Our analysis revealed that introducing distraction for one driver substantially increased the likelihood of "General Unsafe Driving"; distraction for both drivers maximized the probability of "Both Drivers Took Hazardous Actions"; and assigning a teen driver markedly elevated the probability of "Speed and Stopping Violations." Our framework and analytical methods provide a robust and interpretable solution for large-scale automated DHA detection, offering new opportunities for traffic safety analysis and intervention.

LGOct 2, 2025
NVIDIA AI Aerial: AI-Native Wireless Communications

Kobi Cohen-Arazi, Michael Roe, Zhen Hu et al.

6G brings a paradigm shift towards AI-native wireless systems, necessitating the seamless integration of digital signal processing (DSP) and machine learning (ML) within the software stacks of cellular networks. This transformation brings the life cycle of modern networks closer to AI systems, where models and algorithms are iteratively trained, simulated, and deployed across adjacent environments. In this work, we propose a robust framework that compiles Python-based algorithms into GPU-runnable blobs. The result is a unified approach that ensures efficiency, flexibility, and the highest possible performance on NVIDIA GPUs. As an example of the capabilities of the framework, we demonstrate the efficacy of performing the channel estimation function in the PUSCH receiver through a convolutional neural network (CNN) trained in Python. This is done in a digital twin first, and subsequently in a real-time testbed. Our proposed methodology, realized in the NVIDIA AI Aerial platform, lays the foundation for scalable integration of AI/ML models into next-generation cellular systems, and is essential for realizing the vision of natively intelligent 6G networks.

CLJul 5, 2025
XISM: an eXploratory and Interactive Graph Tool to Visualize and Evaluate Semantic Map Models

Zhu Liu, Zhen Hu, Lei Dai et al.

Semantic map models represent meanings or functions as nodes in a graph constrained by the local connectivity hypothesis, with edges indicating their associations. Widely used in typological linguistics, these models compare interrelated meanings across languages. Traditionally built manually in a bottom-up manner, they are inefficient for large datasets and lack visualization and evaluation tools. This paper introduces XISM, an interactive tool based on our prior algorithm, which constructs semantic maps from user data via a top-down approach, displays candidate maps, and evaluates them using multiple metrics. Users can refine maps by editing edges, combining data-driven efficiency with expert knowledge. This human-in-the-loop design benefits both typologists and computational linguists. The system https://770103knev48.vicp.fun/ and a demonstration video https://youtu.be/S-wsVDF2HSI?si=1OrcF41tRznaifhZ are publicly available.

LGMay 7, 2023
Uncertainty Quantification in Machine Learning for Engineering Design and Health Prognostics: A Tutorial

Venkat Nemani, Luca Biggio, Xun Huan et al.

On top of machine learning models, uncertainty quantification (UQ) functions as an essential layer of safety assurance that could lead to more principled decision making by enabling sound risk assessment and management. The safety and reliability improvement of ML models empowered by UQ has the potential to significantly facilitate the broad adoption of ML solutions in high-stakes decision settings, such as healthcare, manufacturing, and aviation, to name a few. In this tutorial, we aim to provide a holistic lens on emerging UQ methods for ML models with a particular focus on neural networks and the applications of these UQ methods in tackling engineering design as well as prognostics and health management problems. Toward this goal, we start with a comprehensive classification of uncertainty types, sources, and causes pertaining to UQ of ML models. Next, we provide a tutorial-style description of several state-of-the-art UQ methods: Gaussian process regression, Bayesian neural network, neural network ensemble, and deterministic UQ methods focusing on spectral-normalized neural Gaussian process. Established upon the mathematical formulations, we subsequently examine the soundness of these UQ methods quantitatively and qualitatively (by a toy regression example) to examine their strengths and shortcomings from different dimensions. Then, we review quantitative metrics commonly used to assess the quality of predictive uncertainty in classification and regression problems. Afterward, we discuss the increasingly important role of UQ of ML models in solving challenging problems in engineering design and health prognostics. Two case studies with source codes available on GitHub are used to demonstrate these UQ methods and compare their performance in the life prediction of lithium-ion batteries at the early stage and the remaining useful life prediction of turbofan engines.

CLDec 19, 2019
CJRC: A Reliable Human-Annotated Benchmark DataSet for Chinese Judicial Reading Comprehension

Xingyi Duan, Baoxin Wang, Ziyue Wang et al.

We present a Chinese judicial reading comprehension (CJRC) dataset which contains approximately 10K documents and almost 50K questions with answers. The documents come from judgment documents and the questions are annotated by law experts. The CJRC dataset can help researchers extract elements by reading comprehension technology. Element extraction is an important task in the legal field. However, it is difficult to predefine the element types completely due to the diversity of document types and causes of action. By contrast, machine reading comprehension technology can quickly extract elements by answering various questions from the long document. We build two strong baseline models based on BERT and BiDAF. The experimental results show that there is enough space for improvement compared to human annotators.

AIOct 13, 2018
Overview of CAIL2018: Legal Judgment Prediction Competition

Haoxi Zhong, Chaojun Xiao, Zhipeng Guo et al.

In this paper, we give an overview of the Legal Judgment Prediction (LJP) competition at Chinese AI and Law challenge (CAIL2018). This competition focuses on LJP which aims to predict the judgment results according to the given facts. Specifically, in CAIL2018 , we proposed three subtasks of LJP for the contestants, i.e., predicting relevant law articles, charges and prison terms given the fact descriptions. CAIL2018 has attracted several hundreds participants (601 teams, 1, 144 contestants from 269 organizations). In this paper, we provide a detailed overview of the task definition, related works, outstanding methods and competition results in CAIL2018.

CLJul 4, 2018
CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction

Chaojun Xiao, Haoxi Zhong, Zhipeng Guo et al.

In this paper, we introduce the \textbf{C}hinese \textbf{AI} and \textbf{L}aw challenge dataset (CAIL2018), the first large-scale Chinese legal dataset for judgment prediction. \dataset contains more than $2.6$ million criminal cases published by the Supreme People's Court of China, which are several times larger than other datasets in existing works on judgment prediction. Moreover, the annotations of judgment results are more detailed and rich. It consists of applicable law articles, charges, and prison terms, which are expected to be inferred according to the fact descriptions of cases. For comparison, we implement several conventional text classification baselines for judgment prediction and experimental results show that it is still a challenge for current models to predict the judgment results of legal cases, especially on prison terms. To help the researchers make improvements on legal judgment prediction, both \dataset and baselines will be released after the CAIL competition\footnote{http://cail.cipsc.org.cn/}.

NEJun 2, 2016
Multi-pretrained Deep Neural Network

Zhen Hu, Zhuyin Xue, Tong Cui et al.

Pretraining is widely used in deep neutral network and one of the most famous pretraining models is Deep Belief Network (DBN). The optimization formulas are different during the pretraining process for different pretraining models. In this paper, we pretrained deep neutral network by different pretraining models and hence investigated the difference between DBN and Stacked Denoising Autoencoder (SDA) when used as pretraining model. The experimental results show that DBN get a better initial model. However the model converges to a relatively worse model after the finetuning process. Yet after pretrained by SDA for the second time the model converges to a better model if finetuned.

LGMar 16, 2016
On the Complexity of One-class SVM for Multiple Instance Learning

Zhen Hu, Zhuyin Xue

In traditional multiple instance learning (MIL), both positive and negative bags are required to learn a prediction function. However, a high human cost is needed to know the label of each bag---positive or negative. Only positive bags contain our focus (positive instances) while negative bags consist of noise or background (negative instances). So we do not expect to spend too much to label the negative bags. Contrary to our expectation, nearly all existing MIL methods require enough negative bags besides positive ones. In this paper we propose an algorithm called "Positive Multiple Instance" (PMI), which learns a classifier given only a set of positive bags. So the annotation of negative bags becomes unnecessary in our method. PMI is constructed based on the assumption that the unknown positive instances in positive bags be similar each other and constitute one compact cluster in feature space and the negative instances locate outside this cluster. The experimental results demonstrate that PMI achieves the performances close to or a little worse than those of the traditional MIL algorithms on benchmark and real data sets. However, the number of training bags in PMI is reduced significantly compared with traditional MIL algorithms.

NEJan 15, 2013
Audio Classical Composer Identification by Deep Neural Network

Zhen Hu, Kun Fu, Changshui Zhang

Audio Classical Composer Identification (ACC) is an important problem in Music Information Retrieval (MIR) which aims at identifying the composer for audio classical music clips. The famous annual competition, Music Information Retrieval Evaluation eXchange (MIREX), also takes it as one of the four training&testing tasks. We built a hybrid model based on Deep Belief Network (DBN) and Stacked Denoising Autoencoder (SDA) to identify the composer from audio signal. As a matter of copyright, sponsors of MIREX cannot publish their data set. We built a comparable data set to test our model. We got an accuracy of 76.26% in our data set which is better than some pure models and shallow models. We think our method is promising even though we test it in a different data set, since our data set is comparable to that in MIREX by size. We also found that samples from different classes become farther away from each other when transformed by more layers in our model.

AIFeb 19, 2012
Generalized FMD Detection for Spectrum Sensing Under Low Signal-to-Noise Ratio

Feng Lin, Robert C. Qiu, Zhen Hu et al.

Spectrum sensing is a fundamental problem in cognitive radio. We propose a function of covariance matrix based detection algorithm for spectrum sensing in cognitive radio network. Monotonically increasing property of function of matrix involving trace operation is utilized as the cornerstone for this algorithm. The advantage of proposed algorithm is it works under extremely low signal-to-noise ratio, like lower than -30 dB with limited sample data. Theoretical analysis of threshold setting for the algorithm is discussed. A performance comparison between the proposed algorithm and other state-of-the-art methods is provided, by the simulation on captured digital television (DTV) signal.