Siwen Yan

8papers

74citations

Novelty48%

AI Score25

Ranked #171,927 of 205,806 authors (top 84%)#37,324 in LG (top 88%)

8 Papers

SIOct 14, 2022

ToupleGDD: A Fine-Designed Solution of Influence Maximization by Deep Reinforcement Learning

Tiantian Chen, Siwen Yan, Jianxiong Guo et al.

Aiming at selecting a small subset of nodes with maximum influence on networks, the Influence Maximization (IM) problem has been extensively studied. Since it is #P-hard to compute the influence spread given a seed set, the state-of-the-art methods, including heuristic and approximation algorithms, faced with great difficulties such as theoretical guarantee, time efficiency, generalization, etc. This makes it unable to adapt to large-scale networks and more complex applications. On the other side, with the latest achievements of Deep Reinforcement Learning (DRL) in artificial intelligence and other fields, lots of works have been focused on exploiting DRL to solve combinatorial optimization problems. Inspired by this, we propose a novel end-to-end DRL framework, ToupleGDD, to address the IM problem in this paper, which incorporates three coupled graph neural networks for network embedding and double deep Q-networks for parameters learning. Previous efforts to solve IM problem with DRL trained their models on subgraphs of the whole network, and then tested on the whole graph, which makes the performance of their models unstable among different networks. However, our model is trained on several small randomly generated graphs with a small budget, and tested on completely different networks under various large budgets, which can obtain results very close to IMM and better results than OPIM-C on several datasets, and shows strong generalization ability. Finally, we conduct a large number of experiments on synthetic and realistic datasets, and experimental results prove the effectiveness and superiority of our model.

LGJun 16, 2022

Explainable Models via Compression of Tree Ensembles

Siwen Yan, Sriraam Natarajan, Saket Joshi et al.

Ensemble models (bagging and gradient-boosting) of relational decision trees have proved to be one of the most effective learning methods in the area of probabilistic logic models (PLMs). While effective, they lose one of the most important aspect of PLMs -- interpretability. In this paper we consider the problem of compressing a large set of learned trees into a single explainable model. To this effect, we propose CoTE -- Compression of Tree Ensembles -- that produces a single small decision list as a compressed representation. CoTE first converts the trees to decision lists and then performs the combination and compression with the aid of the original training set. An experimental evaluation demonstrates the effectiveness of CoTE in several benchmark relational data sets.

LGSep 10, 2023

Knowledge-based Refinement of Scientific Publication Knowledge Graphs

Siwen Yan, Phillip Odom, Sriraam Natarajan

We consider the problem of identifying authorship by posing it as a knowledge graph construction and refinement. To this effect, we model this problem as learning a probabilistic logic model in the presence of human guidance (knowledge-based learning). Specifically, we learn relational regression trees using functional gradient boosting that outputs explainable rules. To incorporate human knowledge, advice in the form of first-order clauses is injected to refine the trees. We demonstrate the usefulness of human knowledge both quantitatively and qualitatively in seven authorship domains.

AISep 18, 2023

Promoting Research Collaboration with Open Data Driven Team Recommendation in Response to Call for Proposals

Siva Likitha Valluru, Biplav Srivastava, Sai Teja Paladi et al.

Building teams and promoting collaboration are two very common business activities. An example of these are seen in the TeamingForFunding problem, where research institutions and researchers are interested to identify collaborative opportunities when applying to funding agencies in response to latter's calls for proposals. We describe a novel system to recommend teams using a variety of AI methods, such that (1) each team achieves the highest possible skill coverage that is demanded by the opportunity, and (2) the workload of distributing the opportunities is balanced amongst the candidate members. We address these questions by extracting skills latent in open data of proposal calls (demand) and researcher profiles (supply), normalizing them using taxonomies, and creating efficient algorithms that match demand to supply. We create teams to maximize goodness along a novel metric balancing short- and long-term objectives. We validate the success of our algorithms (1) quantitatively, by evaluating the recommended teams using a goodness score and find that more informed methods lead to recommendations of smaller number of teams but higher goodness, and (2) qualitatively, by conducting a large-scale user study at a college-wide level, and demonstrate that users overall found the tool very useful and relevant. Lastly, we evaluate our system in two diverse settings in US and India (of researchers and proposal calls) to establish generality of our approach, and deploy it at a major US university for routine use.

LGMar 19, 2021

Predicting Drug-Drug Interactions from Heterogeneous Data: An Embedding Approach

Devendra Singh Dhami, Siwen Yan, Gautam Kunapuli et al.

Predicting and discovering drug-drug interactions (DDIs) using machine learning has been studied extensively. However, most of the approaches have focused on text data or textual representation of the drug structures. We present the first work that uses multiple data sources such as drug structure images, drug structure string representation and relational representation of drug relationships as the input. To this effect, we exploit the recent advances in deep networks to integrate these varied sources of inputs in predicting DDIs. Our empirical evaluation against several state-of-the-art methods using standalone different data types for drugs clearly demonstrate the efficacy of combining heterogeneous data in predicting DDIs.

LGFeb 13, 2021

A Statistical Relational Approach to Learning Distance-based GCNs

Devendra Singh Dhami, Siwen Yan, Sriraam Natarajan

We consider the problem of learning distance-based Graph Convolutional Networks (GCNs) for relational data. Specifically, we first embed the original graph into the Euclidean space $\mathbb{R}^m$ using a relational density estimation technique thereby constructing a secondary Euclidean graph. The graph vertices correspond to the target triples and edges denote the Euclidean distances between the target triples. We emphasize the importance of learning the secondary Euclidean graph and the advantages of employing a distance matrix over the typically used adjacency matrix. Our comprehensive empirical evaluation demonstrates the superiority of our approach over $12$ different GCN models, relational embedding techniques and rule learning techniques.

LGJan 2, 2020

Non-Parametric Learning of Gaifman Models

Devendra Singh Dhami, Siwen Yan, Gautam Kunapuli et al.

We consider the problem of structure learning for Gaifman models and learn relational features that can be used to derive feature representations from a knowledge base. These relational features are first-order rules that are then partially grounded and counted over local neighborhoods of a Gaifman model to obtain the feature representations. We propose a method for learning these relational features for a Gaifman model by using relational tree distances. Our empirical evaluation on real data sets demonstrates the superiority of our approach over classical rule-learning.

LGNov 14, 2019

Beyond Textual Data: Predicting Drug-Drug Interactions from Molecular Structure Images using Siamese Neural Networks

Devendra Singh Dhami, Siwen Yan, Gautam Kunapuli et al.

Predicting and discovering drug-drug interactions (DDIs) is an important problem and has been studied extensively both from medical and machine learning point of view. Almost all of the machine learning approaches have focused on text data or textual representation of the structural data of drugs. We present the first work that uses drug structure images as the input and utilizes a Siamese convolutional network architecture to predict DDIs.