Rongbo Zhu

2papers

2 Papers

LGOct 11, 2023
ADMEOOD: Out-of-Distribution Benchmark for Drug Property Prediction

Shuoying Wei, Xinlong Wen, Lida Zhu et al.

Obtaining accurate and valid information for drug molecules is a crucial and challenging task. However, chemical knowledge and information have been accumulated over the past 100 years from various regions, laboratories, and experimental purposes. Little has been explored in terms of the out-of-distribution (OOD) problem with noise and inconsistency, which may lead to weak robustness and unsatisfied performance. This study proposes a novel benchmark ADMEOOD, a systematic OOD dataset curator and benchmark specifically designed for drug property prediction. ADMEOOD obtained 27 ADME (Absorption, Distribution, Metabolism, Excretion) drug properties from Chembl and relevant literature. Additionally, it includes two kinds of OOD data shifts: Noise Shift and Concept Conflict Drift (CCD). Noise Shift responds to the noise level by categorizing the environment into different confidence levels. On the other hand, CCD describes the data which has inconsistent label among the original data. Finally, it tested on a variety of domain generalization models, and the experimental results demonstrate the effectiveness of the proposed partition method in ADMEOOD: ADMEOOD demonstrates a significant difference performance between in-distribution and out-of-distribution data. Moreover, ERM (Empirical Risk Minimization) and other models exhibit distinct trends in performance across different domains and measurement types.

IRMay 26, 2013
Query Representation with Global Consistency on User Click Graph

Daqiang Zhang, Rongbo Zhu, Shuqiqiu Men et al.

Extensive research has been conducted on query log analysis. A query log is generally represented as a bipartite graph on a query set and a URL set. Most of the traditional methods used the raw click frequency to weigh the link between a query and a URL on the click graph. In order to address the disadvantages of raw click frequency, researchers proposed the entropy-biased model, which incorporates raw click frequency with inverse query frequency of the URL as the weighting scheme for query representation. In this paper, we observe that the inverse query frequency can be considered a global property of the URL on the click graph, which is more informative than raw click frequency, which can be considered a local property of the URL. Based on this insight, we develop the global consistency model for query representation, which utilizes the click frequency and the inverse query frequency of a URL in a consistent manner. Furthermore, we propose a new scheme called inverse URL frequency as an effective way to capture the global property of a URL. Experiments have been conducted on the AOL search engine log data. The result shows that our global consistency model achieved better performance than the current models.