Ruiqing Ding

h-index10

6papers

249citations

Novelty52%

AI Score41

Ranked #63,717 of 194,257 authors (top 33%)#14,364 in LG (top 36%)

6 Papers

5.3LGFeb 11, 2023

Cross-center Early Sepsis Recognition by Medical Knowledge Guided Collaborative Learning for Data-scarce Hospitals

Ruiqing Ding, Fangjie Rong, Xiao Han et al.

There are significant regional inequities in health resources around the world. It has become one of the most focused topics to improve health services for data-scarce hospitals and promote health equity through knowledge sharing among medical institutions. Because electronic medical records (EMRs) contain sensitive personal information, privacy protection is unavoidable and essential for multi-hospital collaboration. In this paper, for a common disease in ICU patients, sepsis, we propose a novel cross-center collaborative learning framework guided by medical knowledge, SofaNet, to achieve early recognition of this disease. The Sepsis-3 guideline, published in 2016, defines that sepsis can be diagnosed by satisfying both suspicion of infection and Sequential Organ Failure Assessment (SOFA) greater than or equal to 2. Based on this knowledge, SofaNet adopts a multi-channel GRU structure to predict SOFA values of different systems, which can be seen as an auxiliary task to generate better health status representations for sepsis recognition. Moreover, we only achieve feature distribution alignment in the hidden space during cross-center collaborative learning, which ensures secure and compliant knowledge transfer without raw data exchange. Extensive experiments on two open clinical datasets, MIMIC-III and Challenge, demonstrate that SofaNet can benefit early sepsis recognition when hospitals only have limited EMRs.

21.3CLDec 10, 2022

A Unified Knowledge Graph Augmentation Service for Boosting Domain-specific NLP Tasks

Ruiqing Ding, Xiao Han, Leye Wang

By focusing the pre-training process on domain-specific corpora, some domain-specific pre-trained language models (PLMs) have achieved state-of-the-art results. However, it is under-investigated to design a unified paradigm to inject domain knowledge in the PLM fine-tuning stage. We propose KnowledgeDA, a unified domain language model development service to enhance the task-specific training procedure with domain knowledge graphs. Given domain-specific task texts input, KnowledgeDA can automatically generate a domain-specific language model following three steps: (i) localize domain knowledge entities in texts via an embedding-similarity approach; (ii) generate augmented samples by retrieving replaceable domain entity pairs from two views of both knowledge graph and training data; (iii) select high-quality augmented samples for fine-tuning via confidence-based assessment. We implement a prototype of KnowledgeDA to learn language models for two domains, healthcare and software development. Experiments on domain-specific text classification and QA tasks verify the effectiveness and generalizability of KnowledgeDA.

7.3DBMay 28, 2025Code

ChatPD: An LLM-driven Paper-Dataset Networking System

Anjie Xu, Ruiqing Ding, Leye Wang

Scientific research heavily depends on suitable datasets for method validation, but existing academic platforms with dataset management like PapersWithCode suffer from inefficiencies in their manual workflow. To overcome this bottleneck, we present a system, called ChatPD, that utilizes Large Language Models (LLMs) to automate dataset information extraction from academic papers and construct a structured paper-dataset network. Our system consists of three key modules: \textit{paper collection}, \textit{dataset information extraction}, and \textit{dataset entity resolution} to construct paper-dataset networks. Specifically, we propose a \textit{Graph Completion and Inference} strategy to map dataset descriptions to their corresponding entities. Through extensive experiments, we demonstrate that ChatPD not only outperforms the existing platform PapersWithCode in dataset usage extraction but also achieves about 90\% precision and recall in entity resolution tasks. Moreover, we have deployed ChatPD to continuously extract which datasets are used in papers, and provide a dataset discovery service, such as task-specific dataset queries and similar dataset recommendations. We open source ChatPD and the current paper-dataset network on this [GitHub repository]{https://github.com/ChatPD-web/ChatPD}.

5.8AIAug 1, 2025

From EMR Data to Clinical Insight: An LLM-Driven Framework for Automated Pre-Consultation Questionnaire Generation

Ruiqing Ding, Qianfang Sun, Yongkang Leng et al.

Pre-consultation is a critical component of effective healthcare delivery. However, generating comprehensive pre-consultation questionnaires from complex, voluminous Electronic Medical Records (EMRs) is a challenging task. Direct Large Language Model (LLM) approaches face difficulties in this task, particularly regarding information completeness, logical order, and disease-level synthesis. To address this issue, we propose a novel multi-stage LLM-driven framework: Stage 1 extracts atomic assertions (key facts with timing) from EMRs; Stage 2 constructs personal causal networks and synthesizes disease knowledge by clustering representative networks from an EMR corpus; Stage 3 generates tailored personal and standardized disease-specific questionnaires based on these structured representations. This framework overcomes limitations of direct methods by building explicit clinical knowledge. Evaluated on a real-world EMR dataset and validated by clinical experts, our method demonstrates superior performance in information coverage, diagnostic relevance, understandability, and generation time, highlighting its practical potential to enhance patient information collection.

5.5LGJun 18, 2021Code

Semi-supervised Optimal Transport with Self-paced Ensemble for Cross-hospital Sepsis Early Detection

Ruiqing Ding, Yu Zhou, Jie Xu et al.

The utilization of computer technology to solve problems in medical scenarios has attracted considerable attention in recent years, which still has great potential and space for exploration. Among them, machine learning has been widely used in the prediction, diagnosis and even treatment of Sepsis. However, state-of-the-art methods require large amounts of labeled medical data for supervised learning. In real-world applications, the lack of labeled data will cause enormous obstacles if one hospital wants to deploy a new Sepsis detection system. Different from the supervised learning setting, we need to use known information (e.g., from another hospital with rich labeled data) to help build a model with acceptable performance, i.e., transfer learning. In this paper, we propose a semi-supervised optimal transport with self-paced ensemble framework for Sepsis early detection, called SPSSOT, to transfer knowledge from the other that has rich labeled data. In SPSSOT, we first extract the same clinical indicators from the source domain (e.g., hospital with rich labeled data) and the target domain (e.g., hospital with little labeled data), then we combine the semi-supervised domain adaptation based on optimal transport theory with self-paced under-sampling to avoid a negative transfer possibly caused by covariate shift and class imbalance. On the whole, SPSSOT is an end-to-end transfer learning method for Sepsis early detection which can automatically select suitable samples from two domains respectively according to the number of iterations and align feature space of two domains. Extensive experiments on two open clinical datasets demonstrate that comparing with other methods, our proposed SPSSOT, can significantly improve the AUC values with only 1% labeled data in the target domain in two transfer learning scenarios, MIMIC $rightarrow$ Challenge and Challenge $rightarrow$ MIMIC.

2.7LGOct 19, 2019

CreditPrint: Credit Investigation via Geographic Footprints by Deep Learning

Xiao Han, Ruiqing Ding, Leye Wang et al.

Credit investigation is critical for financial services. Whereas, traditional methods are often restricted as the employed data hardly provide sufficient, timely and reliable information. With the prevalence of smart mobile devices, peoples' geographic footprints can be automatically and constantly collected nowadays, which provides an unprecedented opportunity for credit investigations. Inspired by the observation that locations are somehow related to peoples' credit level, this research aims to enhance credit investigation with users' geographic footprints. To this end, a two-stage credit investigation framework is designed, namely CreditPrint. In the first stage, CreditPrint explores regions' credit characteristics and learns a credit-aware embedding for each region by considering both each region's individual characteristics and cross-region relationships with graph convolutional networks. In the second stage, a hierarchical attention-based credit assessment network is proposed to aggregate the credit indications from a user's multiple trajectories covering diverse regions. The results on real-life user mobility datasets show that CreditPrint can increase the credit investigation accuracy by up to 10% compared to baseline methods.