Xuesheng Zhang

h-index5
2papers

2 Papers

LGMar 20, 2024
Adaptive Ensembles of Fine-Tuned Transformers for LLM-Generated Text Detection

Zhixin Lai, Xuesheng Zhang, Suiyao Chen

Large language models (LLMs) have reached human-like proficiency in generating diverse textual content, underscoring the necessity for effective fake text detection to avoid potential risks such as fake news in social media. Previous research has mostly tested single models on in-distribution datasets, limiting our understanding of how these models perform on different types of data for LLM-generated text detection task. We researched this by testing five specialized transformer-based models on both in-distribution and out-of-distribution datasets to better assess their performance and generalizability. Our results revealed that single transformer-based classifiers achieved decent performance on in-distribution dataset but limited generalization ability on out-of-distribution dataset. To improve it, we combined the individual classifiers models using adaptive ensemble algorithms, which improved the average accuracy significantly from 91.8% to 99.2% on an in-distribution test set and from 62.9% to 72.5% on an out-of-distribution test set. The results indicate the effectiveness, good generalization ability, and great potential of adaptive ensemble algorithms in LLM-generated text detection.

CVJan 30, 2019
Medical Image Super-Resolution Using a Generative Adversarial Network

Yongpei Zhu, Xuesheng Zhang, Kehong Yuan

During the growing popularity of electronic medical records, electronic medical record (EMR) data has exploded increasingly. It is very meaningful to retrieve high quality EMR in mass data. In this paper, an EMR value network with retrieval function is constructed by taking stroke disease as the research object. It mainly includes: 1) It establishes the electronic medical record database and corresponding stroke knowledge graph. 2) The strategy of similarity measurement is included three parts(patients' chief complaint, pathology results and medical images). Patients' chief complaints are text data, mainly describing patients' symptoms and expressed in words or phrases, and patients' chief complaints are input in independent tick of various symptoms. The data of the pathology results is a structured and digitized expression, so the input method is the same as the patient's chief complaint; Image similarity adopts content-based image retrieval(CBIR) technology. 3) The analytic hierarchy process (AHP) is used to establish the weights of the three types of data and then synthesize them into an indicator. The accuracy rate of similarity in top 5 was more than 85\% based on EMR database with more 200 stroke records using leave-one-out method. It will be the good tool for assistant diagnosis and doctor training, as good quality records are colleted into the databases, like Doctor Watson, in the future.