Jingwei Xiong

CL
h-index7
7papers
40citations
Novelty46%
AI Score51

7 Papers

80.7CLMay 3Code
Hey, That's My Data! Token-Only Dataset Inference in Large Language Models

Chen Xiong, Zihao Wang, Rui Zhu et al.

Large Language Models (LLMs) rely on massive training datasets, often including proprietary data, which raises concerns about unauthorized usage and copyright infringement. Existing dataset inference methods typically require access to log probabilities or other internal signals, but many modern LLMs restrict such access, motivating token-only inference approaches. We propose CatShift, a token-only dataset inference framework based on catastrophic forgetting, where models overwrite prior knowledge when trained on new data. Fine-tuning an LLM on a subset of its training data induces larger output shifts than fine-tuning on unseen data. CatShift compares these shifts against those from a known non-member validation set to infer whether a dataset was included in training. Experiments on both open-source and API-based LLMs show that CatShift remains effective without logit access, enabling practical protection of proprietary datasets.

CLFeb 27, 2025Code
KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model

Kai Zhang, Rui Zhu, Shutian Ma et al.

Drug discovery is a critical task in biomedical natural language processing (NLP), yet explainable drug discovery remains underexplored. Meanwhile, large language models (LLMs) have shown remarkable abilities in natural language understanding and generation. Leveraging LLMs for explainable drug discovery has the potential to improve downstream tasks and real-world applications. In this study, we utilize open-source drug knowledge graphs, clinical trial data, and PubMed publications to construct a comprehensive dataset for the explainable drug discovery task, named \textbf{expRxRec}. Furthermore, we introduce \textbf{KEDRec-LM}, an instruction-tuned LLM which distills knowledge from rich medical knowledge corpus for drug recommendation and rationale generation. To encourage further research in this area, we will publicly release\footnote{A copy is attached with this submission} both the dataset and KEDRec-LM.

63.4CLApr 16
"Excuse me, may I say something..." CoLabScience, A Proactive AI Assistant for Biomedical Discovery and LLM-Expert Collaborations

Yang Wu, Jinhong Yu, Jingwei Xiong et al.

The integration of Large Language Models (LLMs) into scientific workflows presents exciting opportunities to accelerate biomedical discovery. However, the reactive nature of LLMs, which respond only when prompted, limits their effectiveness in collaborative settings that demand foresight and autonomous engagement. In this study, we introduce CoLabScience, a proactive LLM assistant designed to enhance biomedical collaboration between AI systems and human experts through timely, context-aware interventions. At the core of our method is PULI (Positive-Unlabeled Learning-to-Intervene), a novel framework trained with a reinforcement learning objective to determine when and how to intervene in streaming scientific discussions, by leveraging the team's project proposal and long- and short-term conversational memory. To support this work, we introduce BSDD (Biomedical Streaming Dialogue Dataset), a new benchmark of simulated research discussion dialogues with intervention points derived from PubMed articles. Experimental results show that PULI significantly outperforms existing baselines in both intervention precision and collaborative task utility, highlighting the potential of proactive LLMs as intelligent scientific assistants.

46.3LGMay 7
Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose?

Yuchu Liu, Rui Zhu, Jingwei Xiong et al.

Can large language models predict physical and mechanical polymer properties simply by reading unstructured scientific prose? Polymer performance is rarely determined by chemical structure alone; identical nominal polymers can exhibit drastically different behaviors depending on their synthesis route, processing history, morphology, and testing conditions. Yet, state-of-the-art polymer property models typically rely on structure-only representations -- such as SMILES or molecular graphs -- which strip away this vital experimental context. In this work, we introduce \textbf{PolyLM}, a natural-language-only, process- and condition-aware framework that predicts materials performance directly from full-text literature. By circumventing structural inputs entirely, PolyLM preserves the nuanced, unstructured descriptions of synthesis and processing reported by domain scientists. To train this framework, we curated an unprecedented, literature-scale dataset encompassing 185,000 scientific papers and over 276,400 unique polymer samples across 22 physical, mechanical, and thermal properties. We fine-tuned a massive 9-billion-parameter language model (Qwen3.5-9B) using Low-Rank Adaptation (LoRA) and task-level uncertainty weighting. Evaluated on 68,283 held-out observations, the model achieves remarkably high predictive accuracy, establishing new state-of-the-art benchmarks for complex properties. Across the 22 diverse targets, the model achieves a median $R^2$ of 0.74, with predictions for key thermal, mechanical, and physicochemical properties frequently surpassing an $R^2$ of 0.80. These results unequivocally demonstrate that natural language is a powerful, highly scalable interface for realistic materials performance prediction.

CLFeb 11, 2025
Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal Reasoning

Rujing Yao, Yang Wu, Chenghao Wang et al.

Large Language Models (LLMs) have achieved impressive results across numerous domains, yet they experience notable deficiencies in legal question-answering tasks. LLMs often generate generalized responses that lack the logical specificity required for expert legal advice and are prone to hallucination, providing answers that appear correct but are unreliable. Retrieval-Augmented Generation (RAG) techniques offer partial solutions to address this challenge, but existing approaches typically focus only on semantic similarity, neglecting the logical structure essential to legal reasoning. In this paper, we propose the Logical-Semantic Integration Model (LSIM), a novel supervised framework that bridges semantic and logical coherence. LSIM comprises three components: reinforcement learning predicts a structured fact-rule chain for each question, a trainable Deep Structured Semantic Model (DSSM) retrieves the most relevant candidate questions by integrating semantic and logical features, and in-context learning generates the final answer using the retrieved content. Our experiments on a real-world legal QA dataset-validated through both automated metrics and human evaluation-demonstrate that LSIM significantly enhances accuracy and reliability compared to existing methods.

LGJan 24, 2025
LLM4DistReconfig: A Fine-tuned Large Language Model for Power Distribution Network Reconfiguration

Panayiotis Christou, Md. Zahidul Islam, Yuzhang Lin et al.

Power distribution networks are evolving due to the integration of DERs and increased customer participation. To maintain optimal operation, minimize losses, and meet varying load demands, frequent network reconfiguration is necessary. Traditionally, the reconfiguration task relies on optimization software and expert operators, but as systems grow more complex, faster and more adaptive solutions are required without expert intervention. Data-driven reconfiguration is gaining traction for its accuracy, speed, and robustness against incomplete network data. LLMs, with their ability to capture complex patterns, offer a promising approach for efficient and responsive network reconfiguration in evolving complex power networks. In this work, we introduce LLM4DistReconfig, a deep learning-based approach utilizing a fine-tuned LLM to solve the distribution network reconfiguration problem. By carefully crafting prompts and designing a custom loss function, we train the LLM with inputs representing network parameters such as buses, available lines, open lines, node voltages, and system loss. The model then predicts optimal reconfigurations by outputting updated network configurations that minimize system loss while meeting operational constraints. Our approach significantly reduces inference time compared to classical algorithms, allowing for near real-time optimal reconfiguration after training. Experimental results show that our method generates optimal configurations minimizing system loss for five individual and a combined test dataset. It also produces minimal invalid edges, no cycles, or subgraphs across all datasets, fulfilling domain-specific needs. Additionally, the generated responses contain less than 5% improper outputs on seen networks and satisfactory results on unseen networks, demonstrating its effectiveness and reliability for the reconfiguration task.

AIJan 12, 2025
Enhancing Patient-Centric Communication: Leveraging LLMs to Simulate Patient Perspectives

Xinyao Ma, Rui Zhu, Zihao Wang et al.

Large Language Models (LLMs) have demonstrated impressive capabilities in role-playing scenarios, particularly in simulating domain-specific experts using tailored prompts. This ability enables LLMs to adopt the persona of individuals with specific backgrounds, offering a cost-effective and efficient alternative to traditional, resource-intensive user studies. By mimicking human behavior, LLMs can anticipate responses based on concrete demographic or professional profiles. In this paper, we evaluate the effectiveness of LLMs in simulating individuals with diverse backgrounds and analyze the consistency of these simulated behaviors compared to real-world outcomes. In particular, we explore the potential of LLMs to interpret and respond to discharge summaries provided to patients leaving the Intensive Care Unit (ICU). We evaluate and compare with human responses the comprehensibility of discharge summaries among individuals with varying educational backgrounds, using this analysis to assess the strengths and limitations of LLM-driven simulations. Notably, when LLMs are primed with educational background information, they deliver accurate and actionable medical guidance 88% of the time. However, when other information is provided, performance significantly drops, falling below random chance levels. This preliminary study shows the potential benefits and pitfalls of automatically generating patient-specific health information from diverse populations. While LLMs show promise in simulating health personas, our results highlight critical gaps that must be addressed before they can be reliably used in clinical settings. Our findings suggest that a straightforward query-response model could outperform a more tailored approach in delivering health information. This is a crucial first step in understanding how LLMs can be optimized for personalized health communication while maintaining accuracy.