CLFeb 27, 2025Code
KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language ModelKai Zhang, Rui Zhu, Shutian Ma et al.
Drug discovery is a critical task in biomedical natural language processing (NLP), yet explainable drug discovery remains underexplored. Meanwhile, large language models (LLMs) have shown remarkable abilities in natural language understanding and generation. Leveraging LLMs for explainable drug discovery has the potential to improve downstream tasks and real-world applications. In this study, we utilize open-source drug knowledge graphs, clinical trial data, and PubMed publications to construct a comprehensive dataset for the explainable drug discovery task, named \textbf{expRxRec}. Furthermore, we introduce \textbf{KEDRec-LM}, an instruction-tuned LLM which distills knowledge from rich medical knowledge corpus for drug recommendation and rationale generation. To encourage further research in this area, we will publicly release\footnote{A copy is attached with this submission} both the dataset and KEDRec-LM.
IRJan 30, 2025
Citation Recommendation based on Argumentative Zoning of User QueriesShutian Ma, Chengzhi Zhang, Heng Zhang et al.
Citation recommendation aims to locate the important papers for scholars to cite. When writing the citing sentences, the authors usually hold different citing intents, which are referred to citation function in citation analysis. Since argumentative zoning is to identify the argumentative and rhetorical structure in scientific literature, we want to use this information to improve the citation recommendation task. In this paper, a multi-task learning model is built for citation recommendation and argumentative zoning classification. We also generated an annotated corpus of the data from PubMed Central based on a new argumentative zoning schema. The experimental results show that, by considering the argumentative information in the citing sentence, citation recommendation model will get better performance.
IRMay 3, 2021
Improving Community Detection Performance in Heterogeneous Music Network by Learning Edge-type Usefulness DistributionZheng Gao, Chun Guo, Shutian Ma et al.
With music becoming an essential part of daily life, there is an urgent need to develop recommendation systems to assist people targeting better songs with fewer efforts. As the interactions between users and songs naturally construct a complex network, community detection approaches can be applied to reveal users' potential interests on songs by grouping relevant users & songs to the same community. However, as the types of interaction could be heterogeneous, it challenges conventional community detection methods designed originally for homogeneous networks. Although there are existing works on heterogeneous community detection, they are mostly task-driven approaches and not feasible for specific music recommendation. In this paper, we propose a genetic based approach to learn an edge-type usefulness distribution (ETUD) for all edge-types in heterogeneous music networks. ETUD can be regarded as a linear function to project all edges to the same latent space and make them comparable. Therefore a heterogeneous network can be converted to a homogeneous one where those conventional methods are eligible to use. We validate the proposed model on a heterogeneous music network constructed from an online music streaming service. Results show that for conventional methods, ETUD can help to detect communities significantly improving music recommendation accuracy while simultaneously reducing user searching cost.
IRJan 19, 2021
Chronological Citation Recommendation with Time PreferenceShutian Ma, Heng Zhang, Chengzhi Zhang et al.
Citation recommendation is an important task to assist scholars in finding candidate literature to cite. Traditional studies focus on static models of recommending citations, which do not explicitly distinguish differences between papers that are caused by temporal variations. Although, some researchers have investigated chronological citation recommendation by adding time related function or modeling textual topics dynamically. These solutions can hardly cope with function generalization or cold-start problems when there is no information for user profiling or there are isolated papers never being cited. With the rise and fall of science paradigms, scientific topics tend to change and evolve over time. People would have the time preference when citing papers, since most of the theoretical basis exist in classical readings that published in old time, while new techniques are proposed in more recent papers. To explore chronological citation recommendation, this paper wants to predict the time preference based on user queries, which is a probability distribution of citing papers published in different time slices. Then, we use this time preference to re-rank the initial citation list obtained by content-based filtering. Experimental results demonstrate that task performance can be further enhanced by time preference and it's flexible to be added in other citation recommendation frameworks.
DLJun 3, 2020
Mapping the co-evolution of artificial intelligence, robotics, and the internet of things over 20 years (1998-2017)Katy Börner, Olga Scrivner, Leonard E. Cross et al.
Understanding the emergence, co-evolution, and convergence of science and technology (S&T) areas offers competitive intelligence for researchers, managers, policy makers, and others. The resulting data-driven decision support helps set proper research and development (R&D) priorities; develop future S&T investment strategies; monitor key authors, organizations, or countries; perform effective research program assessment; and implement cutting-edge education/training efforts. This paper presents new funding, publication, and scholarly network metrics and visualizations that were validated via expert surveys. The metrics and visualizations exemplify the emergence and convergence of three areas of strategic interest: artificial intelligence (AI), robotics, and internet of things (IoT) over the last 20 years (1998-2017). For 32,716 publications and 4,497 NSF awards, we identify their conceptual space (using the UCSD map of science), geospatial network, and co-evolution landscape. The findings demonstrate how the transition of knowledge (through cross-discipline publications and citations) and the emergence of new concepts (through term bursting) create a tangible potential for interdisciplinary research and new disciplines.