LGDec 24, 2022
Author Name Disambiguation via Heterogeneous Network Embedding from Structural and Semantic PerspectivesWenjin Xie, Siyuan Liu, Xiaomeng Wang et al.
Name ambiguity is common in academic digital libraries, such as multiple authors having the same name. This creates challenges for academic data management and analysis, thus name disambiguation becomes necessary. The procedure of name disambiguation is to divide publications with the same name into different groups, each group belonging to a unique author. A large amount of attribute information in publications makes traditional methods fall into the quagmire of feature selection. These methods always select attributes artificially and equally, which usually causes a negative impact on accuracy. The proposed method is mainly based on representation learning for heterogeneous networks and clustering and exploits the self-attention technology to solve the problem. The presentation of publications is a synthesis of structural and semantic representations. The structural representation is obtained by meta-path-based sampling and a skip-gram-based embedding method, and meta-path level attention is introduced to automatically learn the weight of each feature. The semantic representation is generated using NLP tools. Our proposal performs better in terms of name disambiguation accuracy compared with baselines and the ablation experiments demonstrate the improvement by feature selection and the meta-path level attention in our method. The experimental results show the superiority of our new method for capturing the most attributes from publications and reducing the impact of redundant information.
88.6DLApr 2
Not Just Large: Tall Teams Dominate East Asia's Scientific ProductionSiyuan Liu, Wenjin Xie, Wenyu Chen et al.
Purpose: This study compares the hierarchical structure of scientific teams across countries and investigates factors associated with the observed cross-national differences. Design/methodology/approach: Drawing on 150,817 publications with author contribution statements, we focus on the 15 countries with the largest volume of scientific publications, examine cross-country variations in the proportion of tall teams, and analyze how this proportion correlates with other factors. Findings: Scientific output from East Asia is dominated by tall teams, which persist after controlling for team size, indicating that this pattern cannot be fully accounted for by the prevalence of larger teams in these countries. Cultural factors, measured by Power Distance, as well as the observed funding patterns of major basic science agencies, are associated with the dominance of tall teams in East Asia. Research limitations: This study is limited by its reliance on publications with author contribution statements, which may introduce selection bias; its focus on cultural and funding factors, while leaving other institutional contexts unexamined; and its use of a leadership concentration measure that does not capture other dimensions of hierarchy. Practical implications: Understanding cross-national differences in research team structures and their associated cultural and institutional factors can inform science policy and team management. Originality/value: This study provides a systematic cross-national comparison of team hierarchy and offers a mechanistic understanding of the dominance of tall teams in East Asia, highlighting associations with cultural and funding factors.
SISep 18, 2023
Towards a performance characteristic curve for model evaluation: an application in information diffusion predictionWenjin Xie, Xiaomeng Wang, Radosław Michalski et al.
The information diffusion prediction on social networks aims to predict future recipients of a message, with practical applications in marketing and social media. While different prediction models all claim to perform well, general frameworks for performance evaluation remain limited. Here, we aim to identify a performance characteristic curve for a model, which captures its performance on tasks of different complexity. We propose a metric based on information entropy to quantify the randomness in diffusion data. We then identify a scaling pattern between the randomness and the prediction accuracy of the model. By properly adjusting the variables, data points by different sequence lengths, system sizes, and randomness can all collapse into a single curve. The curve captures a model's inherent capability of making correct predictions against increased uncertainty, which we regard as the performance characteristic curve of the model. The validity of the curve is tested by three prediction models in the same family, reaching conclusions in line with existing studies. In addition, we apply the curve to successfully assess the performance of eight state-of-the-art models, providing a clear and comprehensive evaluation even for models that are challenging to differentiate with conventional metrics. Our work reveals a pattern underlying the data randomness and prediction accuracy. The performance characteristic curve provides a new way to evaluate models' performance systematically, and sheds light on future studies on other frameworks for model evaluation.
IROct 14, 2025
MIARec: Mutual-influence-aware Heterogeneous Network Embedding for Scientific Paper RecommendationWenjin Xie, Tao Jia
With the rapid expansion of scientific literature, scholars increasingly demand precise and high-quality paper recommendations. Among various recommendation methodologies, graph-based approaches have garnered attention by effectively exploiting the structural characteristics inherent in scholarly networks. However, these methods often overlook the asymmetric academic influence that is prevalent in scholarly networks when learning graph representations. To address this limitation, this study proposes the Mutual-Influence-Aware Recommendation (MIARec) model, which employs a gravity-based approach to measure the mutual academic influence between scholars and incorporates this influence into the feature aggregation process during message propagation in graph representation learning. Additionally, the model utilizes a multi-channel aggregation method to capture both individual embeddings of distinct single relational sub-networks and their interdependent embeddings, thereby enabling a more comprehensive understanding of the heterogeneous scholarly network. Extensive experiments conducted on real-world datasets demonstrate that the MIARec model outperforms baseline models across three primary evaluation metrics, indicating its effectiveness in scientific paper recommendation tasks.
AIJul 27, 2025
Improving Subgraph Matching by Combining Algorithms and Graph Neural NetworksShuyang Guo, Wenjin Xie, Ping Lu et al.
Homomorphism is a key mapping technique between graphs that preserves their structure. Given a graph and a pattern, the subgraph homomorphism problem involves finding a mapping from the pattern to the graph, ensuring that adjacent vertices in the pattern are mapped to adjacent vertices in the graph. Unlike subgraph isomorphism, which requires a one-to-one mapping, homomorphism allows multiple vertices in the pattern to map to the same vertex in the graph, making it more complex. We propose HFrame, the first graph neural network-based framework for subgraph homomorphism, which integrates traditional algorithms with machine learning techniques. We demonstrate that HFrame outperforms standard graph neural networks by being able to distinguish more graph pairs where the pattern is not homomorphic to the graph. Additionally, we provide a generalization error bound for HFrame. Through experiments on both real-world and synthetic graphs, we show that HFrame is up to 101.91 times faster than exact matching algorithms and achieves an average accuracy of 0.962.
LGMay 18, 2021
Independent Asymmetric Embedding for Information Diffusion Prediction on Social NetworksWenjin Xie, Xiaomeng Wang, Tao Jia
The prediction for information diffusion on social networks has great practical significance in marketing and public opinion control. It aims to predict the individuals who will potentially repost the message on the social network. One type of method is based on demographics, complex networks and other prior knowledge to establish an interpretable model to simulate and predict the propagation process, while the other type of method is completely data-driven and maps the nodes to a latent space for propagation prediction. Existing latent space design and embedding methods lack consideration for the intervene among users. In this paper, we propose an independent asymmetric embedding method to embed each individual into one latent influence space and multiple latent susceptibility spaces. Based on the similarity between information diffusion and heat diffusion phenomenon, the heat diffusion kernel is exploited in our model and establishes the embedding rules. Furthermore, our method captures the co-occurrence regulation of user combinations in cascades to improve the calculating effectiveness. The results of extensive experiments conducted on real-world datasets verify both the predictive accuracy and cost-effectiveness of our approach.