Wenbiao Li

LG
h-index30
7papers
326citations
Novelty41%
AI Score47

7 Papers

CLOct 19, 2022
A Unified Neural Network Model for Readability Assessment with Feature Projection and Length-Balanced Loss

Wenbiao Li, Ziyang Wang, Yunfang Wu

For readability assessment, traditional methods mainly employ machine learning classifiers with hundreds of linguistic features. Although the deep learning model has become the prominent approach for almost all NLP tasks, it is less explored for readability assessment. In this paper, we propose a BERT-based model with feature projection and length-balanced loss (BERT-FP-LBL) for readability assessment. Specially, we present a new difficulty knowledge guided semi-supervised method to extract topic features to complement the traditional linguistic features. From the linguistic features, we employ projection filtering to extract orthogonal features to supplement BERT representations. Furthermore, we design a new length-balanced loss to handle the greatly varying length distribution of data. Our model achieves state-of-the-art performances on two English benchmark datasets and one dataset of Chinese textbooks, and also achieves the near-perfect accuracy of 99\% on one English dataset. Moreover, our proposed model obtains comparable results with human experts in consistency test.

76.2CYApr 15
Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking

Alexander Nemecek, Osama Zafar, Yuqiao Xu et al.

Watermarking is becoming the default mechanism for AI content authentication, with governance policies and frameworks referencing it as infrastructure for content provenance. Yet across text, image, and audio modalities, watermark signal strength, detectability, and robustness depend on statistical properties of the content itself, properties that vary systematically across languages, cultural visual traditions, and demographic groups. We examine how this content dependence creates modality-specific pathways to bias. Reviewing the major watermarking benchmarks across modalities, we find that, with one exception, none report performance across languages, cultural content types, or population groups. To address this, we propose three concrete evaluation dimensions for pluralistic watermark benchmarking: cross-lingual detection parity, culturally diverse content coverage, and demographic disaggregation of detection metrics. We connect these to the governance frameworks currently mandating watermarking deployment and show that watermarking is held to a lower fairness standard than the generative systems it is meant to govern. Our position is that evaluation must precede deployment, and that the same bias auditing requirements applied to AI models should extend to the verification layer.

CLJul 13, 2022
Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models

Wenbiao Li, Rui Sun, Yunfang Wu

Most of the Chinese pre-trained models adopt characters as basic units for downstream tasks. However, these models ignore the information carried by words and thus lead to the loss of some important semantics. In this paper, we propose a new method to exploit word structure and integrate lexical semantics into character representations of pre-trained models. Specifically, we project a word's embedding into its internal characters' embeddings according to the similarity weight. To strengthen the word boundary information, we mix the representations of the internal characters within a word. After that, we apply a word-to-character alignment attention mechanism to emphasize important characters by masking unimportant ones. Moreover, in order to reduce the error propagation caused by word segmentation, we present an ensemble approach to combine segmentation results given by different tokenizers. The experimental results show that our approach achieves superior performance over the basic pre-trained models BERT, BERT-wwm and ERNIE on different Chinese NLP tasks: sentiment classification, sentence pair matching, natural language inference and machine reading comprehension. We make further analysis to prove the effectiveness of each component of our model.

CYFeb 3, 2023
Exploring the Cognitive Dynamics of Artificial Intelligence in the Post-COVID-19 and Learning 3.0 Era: A Case Study of ChatGPT

Lingfei Luan, Xi Lin, Wenbiao Li

The emergence of artificial intelligence has incited a paradigm shift across the spectrum of human endeavors, with ChatGPT serving as a catalyst for the transformation of various established domains, including but not limited to education, journalism, security, and ethics. In the post-pandemic era, the widespread adoption of remote work has prompted the educational sector to reassess conventional pedagogical methods. This paper is to scrutinize the underlying psychological principles of ChatGPT, delve into the factors that captivate user attention, and implicate its ramifications on the future of learning. The ultimate objective of this study is to instigate a scholarly discourse on the interplay between technological advancements in education and the evolution of human learning patterns, raising the question of whether technology is driving human evolution or vice versa.

55.7LGMay 16
Privacy Policy Enforcement Guardrails for Data-Sensitive Retrieval-Augmented Generation

Osama Zafar, Alexander Nemecek, Yiqian Zhang et al.

Standard PII filters often miss contextual data leakage in RAG systems, such as non-regulated attribute clusters that collectively identify individuals. We introduce a Privacy Policy Enforcement (PPE) framework using dual one-class density estimators with fused text embeddings and a calibrated abstain region for out-of-distribution inputs. Using an axis-stratified, multi-LLM synthetic data pipeline across medicine, finance, and law, we found that traditional Gaussian Mixture baselines fail on borderline-safe stress tests by focusing on linguistic register rather than content. Our proposed T3+OCSVM detector, trained on safe and borderline-safe data, achieves a borderline AUROC of 0.93+ while reducing false positives by 44-55 percentage points and maintaining millisecond latency. Compared to supervised MLP classifiers or 14B-parameter LLM judges, our framework offers superior operational suitability, as the former suffers from high abstention rates and the latter from latency and calibration issues. This methodology provides a robust stress-testing standard for any synthetic-data-trained classifier.

LGSep 25, 2025
PQFed: A Privacy-Preserving Quality-Controlled Federated Learning Framework

Weiqi Yue, Wenbiao Li, Yuzhou Jiang et al.

Federated learning enables collaborative model training without sharing raw data, but data heterogeneity consistently challenges the performance of the global model. Traditional optimization methods often rely on collaborative global model training involving all clients, followed by local adaptation to improve individual performance. In this work, we focus on early-stage quality control and propose PQFed, a novel privacy-preserving personalized federated learning framework that designs customized training strategies for each client prior to the federated training process. PQFed extracts representative features from each client's raw data and applies clustering techniques to estimate inter-client dataset similarity. Based on these similarity estimates, the framework implements a client selection strategy that enables each client to collaborate with others who have compatible data distributions. We evaluate PQFed on two benchmark datasets, CIFAR-10 and MNIST, integrated with three existing federated learning algorithms. Experimental results show that PQFed consistently improves the target client's model performance, even with a limited number of participants. We further benchmark PQFed against a baseline cluster-based algorithm, IFCA, and observe that PQFed also achieves better performance in low-participation scenarios. These findings highlight PQFed's scalability and effectiveness in personalized federated learning settings.

LGJan 14, 2025
Privacy-Preserving Model and Preprocessing Verification for Machine Learning

Wenbiao Li, Anisa Halimi, Xiaoqian Jiang et al.

This paper presents a framework for privacy-preserving verification of machine learning models, focusing on models trained on sensitive data. Integrating Local Differential Privacy (LDP) with model explanations from LIME and SHAP, our framework enables robust verification without compromising individual privacy. It addresses two key tasks: binary classification, to verify if a target model was trained correctly by applying the appropriate preprocessing steps, and multi-class classification, to identify specific preprocessing errors. Evaluations on three real-world datasets-Diabetes, Adult, and Student Record-demonstrate that while the ML-based approach is particularly effective in binary tasks, the threshold-based method performs comparably in multi-class tasks. Results indicate that although verification accuracy varies across datasets and noise levels, the framework provides effective detection of preprocessing errors, strong privacy guarantees, and practical applicability for safeguarding sensitive data.