Yixin Su

IR
h-index28
9papers
247citations
Novelty53%
AI Score45

9 Papers

IRJul 18, 2023
AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment enabled by Large Language Models

Rui Zhang, Yixin Su, Bayu Distiawan Trisedya et al.

The task of entity alignment between knowledge graphs (KGs) aims to identify every pair of entities from two different KGs that represent the same entity. Many machine learning-based methods have been proposed for this task. However, to our best knowledge, existing methods all require manually crafted seed alignments, which are expensive to obtain. In this paper, we propose the first fully automatic alignment method named AutoAlign, which does not require any manually crafted seed alignments. Specifically, for predicate embeddings, AutoAlign constructs a predicate-proximity-graph with the help of large language models to automatically capture the similarity between predicates across two KGs. For entity embeddings, AutoAlign first computes the entity embeddings of each KG independently using TransE, and then shifts the two KGs' entity embeddings into the same vector space by computing the similarity between entities based on their attributes. Thus, both predicate alignment and entity alignment can be done without manually crafted seed alignments. AutoAlign is not only fully automatic, but also highly effective. Experiments using real-world KGs show that AutoAlign improves the performance of entity alignment significantly compared to state-of-the-art methods.

IRJun 28, 2022
Detecting Arbitrary Order Beneficial Feature Interactions for Recommender Systems

Yixin Su, Yunxiang Zhao, Sarah Erfani et al.

Detecting beneficial feature interactions is essential in recommender systems, and existing approaches achieve this by examining all the possible feature interactions. However, the cost of examining all the possible higher-order feature interactions is prohibitive (exponentially growing with the order increasing). Hence existing approaches only detect limited order (e.g., combinations of up to four features) beneficial feature interactions, which may miss beneficial feature interactions with orders higher than the limitation. In this paper, we propose a hypergraph neural network based model named HIRS. HIRS is the first work that directly generates beneficial feature interactions of arbitrary orders and makes recommendation predictions accordingly. The number of generated feature interactions can be specified to be much smaller than the number of all the possible interactions and hence, our model admits a much lower running time. To achieve an effective algorithm, we exploit three properties of beneficial feature interactions, and propose deep-infomax-based methods to guide the interaction generation. Our experimental results show that HIRS outperforms state-of-the-art algorithms by up to 5% in terms of recommendation accuracy.

IRApr 7Code
QKVQA: Question-Focused Filtering for Knowledge-based VQA

Wei Ye, Yixin Su, Yueguo Chen et al.

Visual Question Answering (VQA) is the task of answering questions based on image content. Building upon this, Knowledge-Based VQA (KB-VQA) requires models to answer questions that depend on external knowledge beyond the visual content of an image. In such settings, effective knowledge filtering is essential for achieving high question answering accuracy. Typical filtering methods suffer from two issues: they fail to focus on parts relevant to the question during candidate section encoding, and they use similarity metrics to locate a section from a single article, resulting in information limitation. To address these issues, this paper proposes a question-focused, cross-article filtering method. Specifically, we design a trainable Question-Focused Filter (QFF) and a Chunk-based Dynamic Cross-Article Selection module (CDA). This approach maintains inference time comparable to the optimal method with the shorter context length, efficiently obtaining high-quality filtered knowledge. The accuracy outperforms current state-of-the-art methods by 3.2 and 2.2 percentage points on Encyclopedic-VQA and InfoSeek, respectively. The code is publicly available at: https://github.com/leaffeall/QKVQA.

LGOct 30, 2025
Accumulative SGD Influence Estimation for Data Attribution

Yunxiao Shi, Shuo Yang, Yixin Su et al.

Modern data-centric AI needs precise per-sample influence. Standard SGD-IE approximates leave-one-out effects by summing per-epoch surrogates and ignores cross-epoch compounding, which misranks critical examples. We propose ACC-SGD-IE, a trajectory-aware estimator that propagates the leave-one-out perturbation across training and updates an accumulative influence state at each step. In smooth strongly convex settings it achieves geometric error contraction and, in smooth non-convex regimes, it tightens error bounds; larger mini-batches further reduce constants. Empirically, on Adult, 20 Newsgroups, and MNIST under clean and corrupted data and both convex and non-convex training, ACC-SGD-IE yields more accurate influence estimates, especially over long epochs. For downstream data cleansing it more reliably flags noisy samples, producing models trained on ACC-SGD-IE cleaned data that outperform those cleaned with SGD-IE.

LGMar 24, 2024
A Survey on Self-Supervised Graph Foundation Models: Knowledge-Based Perspective

Ziwen Zhao, Yixin Su, Yuhua Li et al.

Graph self-supervised learning (SSL) is now a go-to method for pre-training graph foundation models (GFMs). There is a wide variety of knowledge patterns embedded in the graph data, such as node properties and clusters, which are crucial to learning generalized representations for GFMs. However, existing surveys of GFMs have several shortcomings: they lack comprehensiveness regarding the most recent progress, have unclear categorization of self-supervised methods, and take a limited architecture-based perspective that is restricted to only certain types of graph models. As the ultimate goal of GFMs is to learn generalized graph knowledge, we provide a comprehensive survey of self-supervised GFMs from a novel knowledge-based perspective. We propose a knowledge-based taxonomy, which categorizes self-supervised graph models by the specific graph knowledge utilized. Our taxonomy consists of microscopic (nodes, links, etc.), mesoscopic (context, clusters, etc.), and macroscopic knowledge (global structure, manifolds, etc.). It covers a total of 9 knowledge categories and more than 25 pretext tasks for pre-training GFMs, as well as various downstream task generalization strategies. Such a knowledge-based taxonomy allows us to re-examine graph models based on new architectures more clearly, such as graph language models, as well as provide more in-depth insights for constructing GFMs.

IRMar 5, 2025
Intrinsic and Extrinsic Factor Disentanglement for Recommendation in Various Context Scenarios

Yixin Su, Wei Jiang, Fangquan Lin et al.

In recommender systems, the patterns of user behaviors (e.g., purchase, click) may vary greatly in different contexts (e.g., time and location). This is because user behavior is jointly determined by two types of factors: intrinsic factors, which reflect consistent user preference, and extrinsic factors, which reflect external incentives that may vary in different contexts. Differentiating between intrinsic and extrinsic factors helps learn user behaviors better. However, existing studies have only considered differentiating them from a single, pre-defined context (e.g., time or location), ignoring the fact that a user's extrinsic factors may be influenced by the interplay of various contexts at the same time. In this paper, we propose the Intrinsic-Extrinsic Disentangled Recommendation (IEDR) model, a generic framework that differentiates intrinsic from extrinsic factors considering various contexts simultaneously, enabling more accurate differentiation of factors and hence the improvement of recommendation accuracy. IEDR contains a context-invariant contrastive learning component to capture intrinsic factors, and a disentanglement component to extract extrinsic factors under the interplay of various contexts. The two components work together to achieve effective factor learning. Extensive experiments on real-world datasets demonstrate IEDR's effectiveness in learning disentangled factors and significantly improving recommendation accuracy by up to 4% in NDCG.

IRMay 10, 2021
Neural Graph Matching based Collaborative Filtering

Yixin Su, Rui Zhang, Sarah Erfani et al.

User and item attributes are essential side-information; their interactions (i.e., their co-occurrence in the sample data) can significantly enhance prediction accuracy in various recommender systems. We identify two different types of attribute interactions, inner interactions and cross interactions: inner interactions are those between only user attributes or those between only item attributes; cross interactions are those between user attributes and item attributes. Existing models do not distinguish these two types of attribute interactions, which may not be the most effective way to exploit the information carried by the interactions. To address this drawback, we propose a neural Graph Matching based Collaborative Filtering model (GMCF), which effectively captures the two types of attribute interactions through modeling and aggregating attribute interactions in a graph matching structure for recommendation. In our model, the two essential recommendation procedures, characteristic learning and preference matching, are explicitly conducted through graph learning (based on inner interactions) and node matching (based on cross interactions), respectively. Experimental results show that our model outperforms state-of-the-art models. Further studies verify the effectiveness of GMCF in improving the accuracy of recommendation.

LGAug 2, 2020
Detecting Beneficial Feature Interactions for Recommender Systems

Yixin Su, Rui Zhang, Sarah Erfani et al.

Feature interactions are essential for achieving high accuracy in recommender systems. Many studies take into account the interaction between every pair of features. However, this is suboptimal because some feature interactions may not be that relevant to the recommendation result, and taking them into account may introduce noise and decrease recommendation accuracy. To make the best out of feature interactions, we propose a graph neural network approach to effectively model them, together with a novel technique to automatically detect those feature interactions that are beneficial in terms of recommendation accuracy. The automatic feature interaction detection is achieved via edge prediction with an L0 activation regularization. Our proposed model is proved to be effective through the information bottleneck principle and statistical interaction theory. Experimental results show that our model (i) outperforms existing baselines in terms of accuracy, and (ii) automatically identifies beneficial feature interactions.

IRAug 3, 2019
MMF: Attribute Interpretable Collaborative Filtering

Yixin Su, Sarah Monazam Erfani, Rui Zhang

Collaborative filtering is one of the most popular techniques in designing recommendation systems, and its most representative model, matrix factorization, has been wildly used by researchers and the industry. However, this model suffers from the lack of interpretability and the item cold-start problem, which limit its reliability and practicability. In this paper, we propose an interpretable recommendation model called Multi-Matrix Factorization (MMF), which addresses these two limitations and achieves the state-of-the-art prediction accuracy by exploiting common attributes that are present in different items. In the model, predicted item ratings are regarded as weighted aggregations of attribute ratings generated by the inner product of the user latent vectors and the attribute latent vectors. MMF provides more fine grained analyses than matrix factorization in the following ways: attribute ratings with weights allow the understanding of how much each attribute contributes to the recommendation and hence provide interpretability; the common attributes can act as a link between existing and new items, which solves the item cold-start problem when no rating exists on an item. We evaluate the interpretability of MMF comprehensively, and conduct extensive experiments on real datasets to show that MMF outperforms state-of-the-art baselines in terms of accuracy.