CLApr 15, 2023Code
Medical Question Summarization with Entity-driven Contrastive LearningWenpeng Lu, Sibo Wei, Xueping Peng et al.
By summarizing longer consumer health questions into shorter and essential ones, medical question-answering systems can more accurately understand consumer intentions and retrieve suitable answers. However, medical question summarization is very challenging due to obvious distinctions in health trouble descriptions from patients and doctors. Although deep learning has been applied to successfully address the medical question summarization (MQS) task, two challenges remain: how to correctly capture question focus to model its semantic intention, and how to obtain reliable datasets to fairly evaluate performance. To address these challenges, this paper proposes a novel medical question summarization framework based on entity-driven contrastive learning (ECL). ECL employs medical entities present in frequently asked questions (FAQs) as focuses and devises an effective mechanism to generate hard negative samples. This approach compels models to focus on essential information and consequently generate more accurate question summaries. Furthermore, we have discovered that some MQS datasets, such as the iCliniq dataset with a 33% duplicate rate, have significant data leakage issues. To ensure an impartial evaluation of the related methods, this paper carefully examines leaked samples to reorganize more reasonable datasets. Extensive experiments demonstrate that our ECL method outperforms the existing methods and achieves new state-of-the-art performance, i.e., 52.85, 43.16, 41.31, 43.52 in terms of ROUGE-1 metric on MeQSum, CHQ-Summ, iCliniq, HealthCareMagic dataset, respectively. The code and datasets are available at https://github.com/yrbobo/MQS-ECL.
CLNov 17, 2024Code
BianCang: A Traditional Chinese Medicine Large Language ModelSibo Wei, Xueping Peng, Yi-Fei Wang et al.
The surge of large language models (LLMs) has driven significant progress in medical applications, including traditional Chinese medicine (TCM). However, current medical LLMs struggle with TCM diagnosis and syndrome differentiation due to substantial differences between TCM and modern medical theory, and the scarcity of specialized, high-quality corpora. To this end, in this paper we propose BianCang, a TCM-specific LLM, using a two-stage training process that first injects domain-specific knowledge and then aligns it through targeted stimulation to enhance diagnostic and differentiation capabilities. Specifically, we constructed pre-training corpora, instruction-aligned datasets based on real hospital records, and the ChP-TCM dataset derived from the Pharmacopoeia of the People's Republic of China. We compiled extensive TCM and medical corpora for continual pre-training and supervised fine-tuning, building a comprehensive dataset to refine the model's understanding of TCM. Evaluations across 11 test sets involving 31 models and 4 tasks demonstrate the effectiveness of BianCang, offering valuable insights for future research. Code, datasets, and models are available on https://github.com/QLU-NLP/BianCang.
CLDec 31, 2025
BIOME-Bench: A Benchmark for Biomolecular Interaction Inference and Multi-Omics Pathway Mechanism Elucidation from Scientific LiteratureSibo Wei, Peng Chen, Lifeng Dong et al.
Multi-omics studies often rely on pathway enrichment to interpret heterogeneous molecular changes, but pathway enrichment (PE)-based workflows inherit structural limitations of pathway resources, including curation lag, functional redundancy, and limited sensitivity to molecular states and interventions. Although recent work has explored using large language models (LLMs) to improve PE-based interpretation, the lack of a standardized benchmark for end-to-end multi-omics pathway mechanism elucidation has largely confined evaluation to small, manually curated datasets or ad hoc case studies, hindering reproducible progress. To address this issue, we introduce BIOME-Bench, constructed via a rigorous four-stage workflow, to evaluate two core capabilities of LLMs in multi-omics analysis: Biomolecular Interaction Inference and end-to-end Multi-Omics Pathway Mechanism Elucidation. We develop evaluation protocols for both tasks and conduct comprehensive experiments across multiple strong contemporary models. Experimental results demonstrate that existing models still exhibit substantial deficiencies in multi-omics analysis, struggling to reliably distinguish fine-grained biomolecular relation types and to generate faithful, robust pathway-level mechanistic explanations.
CLOct 14, 2025Code
The Harder The Better: Maintaining Supervised Fine-tuning Generalization with Less but Harder DataZhaoyang Shang, Sibo Wei, Jianbin Guo et al.
Large Language Models (LLMs) excel in general tasks, but adapting them to specialized domains relies on high-quality supervised fine-tuning (SFT) data. Although existing methods can identify subsets of high-quality data and reduce training cost to some extent, their selection process still suffers from over-reliance on LLMs' internal knowledge, weak interpretability, and limited generalization. To address these limitations, we propose THTB (The Harder The Better), a cognitive science-inspired framework for instruction data selection and annotation guidance. THTB prioritizes higher-level cognitive instructions by combining quality filtering with intrinsic and extrinsic hardness scoring, offering interpretable and quantifiable criteria for efficient SFT, both in data selection and annotation guidance. Experiments show that THTB enables models trained on only 5% of the data to outperform full-dataset training, while achieving superior generalization compared with LLM-only selection. In addition, THTB provides effective annotation guidance in vertical domains, enabling a model trained on just 2% of the data to surpass models trained on much larger datasets, demonstrating strong potential for domain adaptation. Our code, datasets, and models are available on https://github.com/DYJG-research/THTB.
CLMar 23, 2025
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM InferenceYouhui Zuo, Sibo Wei, Chen Zhang et al.
With the advancements in long-context inference capabilities of large language models (LLMs), the KV cache has become one of the foundational components. However, its substantial GPU memory consumption makes KV cache compression a key technique for enabling efficient LLM inference in industrial scenarios. While recent studies have focused on optimizing the memory occupied by the KV cache, they overlook two critical factors: preserving semantic coherence and considering task-specific characteristic during compression. To address these limitations, we propose a novel task-adaptive KV cache window selection method, WindowKV. WindowKV dynamically selects local semantic windows consisting of consecutive tokens, according to task-specific characteristics, ensuring the retained KV cache captures continuous, essential context. Additionally, we introduce an intra-group layer KV cache indices sharing strategy to reduce computational overhead, achieving a balance between performance and efficiency. We rigorously evaluate WindowKV on the LongBench benchmark, and the results demonstrate that it maintains a performance comparable to full KV cache retention while using only 12% of the original KV cache, significantly reducing memory requirements. Furthermore, our method also achieves state-of-the-art results in the Needle-in-a-Haystack evaluation, highlighting its effectiveness and robustness.