Sang-Wook Kim

IR
h-index8
19papers
545citations
Novelty47%
AI Score51

19 Papers

SIDec 8, 2022Code
A Survey of Graph Neural Networks for Social Recommender Systems

Kartik Sharma, Yeon-Chang Lee, Sivagami Nambi et al. · gatech, stanford

Social recommender systems (SocialRS) simultaneously leverage the user-to-item interactions as well as the user-to-user social relations for the task of generating item recommendations to users. Additionally exploiting social relations is clearly effective in understanding users' tastes due to the effects of homophily and social influence. For this reason, SocialRS has increasingly attracted attention. In particular, with the advance of graph neural networks (GNN), many GNN-based SocialRS methods have been developed recently. Therefore, we conduct a comprehensive and systematic review of the literature on GNN-based SocialRS. In this survey, we first identify 84 papers on GNN-based SocialRS after annotating 2151 papers by following the PRISMA framework (preferred reporting items for systematic reviews and meta-analyses). Then, we comprehensively review them in terms of their inputs and architectures to propose a novel taxonomy: (1) input taxonomy includes 5 groups of input type notations and 7 groups of input representation notations; (2) architecture taxonomy includes 8 groups of GNN encoder notations, 2 groups of decoder notations, and 12 groups of loss function notations. We classify the GNN-based SocialRS methods into several categories as per the taxonomy and describe their details. Furthermore, we summarize benchmark datasets and metrics widely used to evaluate the GNN-based SocialRS methods. Finally, we conclude this survey by presenting some future research directions. GitHub repository with the curated list of papers are available at https://github.com/claws-lab/awesome-GNN-social-recsys.

LGAug 23, 2024Code
Disentangling, Amplifying, and Debiasing: Learning Disentangled Representations for Fair Graph Neural Networks

Yeon-Chang Lee, Hojung Shin, Sang-Wook Kim · gatech

Graph Neural Networks (GNNs) have become essential tools for graph representation learning in various domains, such as social media and healthcare. However, they often suffer from fairness issues due to inherent biases in node attributes and graph structure, leading to unfair predictions. To address these challenges, we propose a novel GNN framework, DAB-GNN, that Disentangles, Amplifies, and deBiases attribute, structure, and potential biases in the GNN mechanism. DAB-GNN employs a disentanglement and amplification module that isolates and amplifies each type of bias through specialized disentanglers, followed by a debiasing module that minimizes the distance between subgroup distributions. Extensive experiments on five datasets demonstrate that DAB-GNN significantly outperforms ten state-of-the-art competitors in terms of achieving an optimal balance between accuracy and fairness. The codebase of DAB-GNN is available at https://github.com/Bigdasgit/DAB-GNN

LGAug 28, 2024Code
CAPER: Enhancing Career Trajectory Prediction using Temporal Knowledge Graph and Ternary Relationship

Yeon-Chang Lee, JaeHyun Lee, Michiharu Yamashita et al. · gatech

The problem of career trajectory prediction (CTP) aims to predict one's future employer or job position. While several CTP methods have been developed for this problem, we posit that none of these methods (1) jointly considers the mutual ternary dependency between three key units (i.e., user, position, and company) of a career and (2) captures the characteristic shifts of key units in career over time, leading to an inaccurate understanding of the job movement patterns in the labor market. To address the above challenges, we propose a novel solution, named as CAPER, that solves the challenges via sophisticated temporal knowledge graph (TKG) modeling. It enables the utilization of a graph-structured knowledge base with rich expressiveness, effectively preserving the changes in job movement patterns. Furthermore, we devise an extrapolated career reasoning task on TKG for a realistic evaluation. The experiments on a real-world career trajectory dataset demonstrate that CAPER consistently and significantly outperforms four baselines, two recent TKG reasoning methods, and five state-of-the-art CTP methods in predicting one's future companies and positions--i.e., on average, yielding 6.80% and 34.58% more accurate predictions, respectively. The codebase of CAPER is available at https://github.com/Bigdasgit/CAPER.

CRSep 3, 2022
Phishing URL Detection: A Network-based Approach Robust to Evasion

Taeri Kim, Noseong Park, Jiwon Hong et al.

Many cyberattacks start with disseminating phishing URLs. When clicking these phishing URLs, the victim's private information is leaked to the attacker. There have been proposed several machine learning methods to detect phishing URLs. However, it still remains under-explored to detect phishing URLs with evasion, i.e., phishing URLs that pretend to be benign by manipulating patterns. In many cases, the attacker i) reuses prepared phishing web pages because making a completely brand-new set costs non-trivial expenses, ii) prefers hosting companies that do not require private information and are cheaper than others, iii) prefers shared hosting for cost efficiency, and iv) sometimes uses benign domains, IP addresses, and URL string patterns to evade existing detection methods. Inspired by those behavioral characteristics, we present a network-based inference method to accurately detect phishing URLs camouflaged with legitimate patterns, i.e., robust to evasion. In the network approach, a phishing URL will be still identified as phishy even after evasion unless a majority of its neighbors in the network are evaded at the same time. Our method consistently shows better detection performance throughout various experimental tests than state-of-the-art methods, e.g., F-1 of 0.89 for our method vs. 0.84 for the best feature-based method.

LGSep 11, 2023Code
Enhancing Hyperedge Prediction with Context-Aware Self-Supervised Learning

Yunyong Ko, Hanghang Tong, Sang-Wook Kim

Hypergraphs can naturally model group-wise relations (e.g., a group of users who co-purchase an item) as hyperedges. Hyperedge prediction is to predict future or unobserved hyperedges, which is a fundamental task in many real-world applications (e.g., group recommendation). Despite the recent breakthrough of hyperedge prediction methods, the following challenges have been rarely studied: (C1) How to aggregate the nodes in each hyperedge candidate for accurate hyperedge prediction? and (C2) How to mitigate the inherent data sparsity problem in hyperedge prediction? To tackle both challenges together, in this paper, we propose a novel hyperedge prediction framework (CASH) that employs (1) context-aware node aggregation to precisely capture complex relations among nodes in each hyperedge for (C1) and (2) self-supervised contrastive learning in the context of hyperedge prediction to enhance hypergraph representations for (C2). Furthermore, as for (C2), we propose a hyperedge-aware augmentation method to fully exploit the latent semantics behind the original hypergraph and consider both node-level and group-level contrasts (i.e., dual contrasts) for better node and hyperedge representations. Extensive experiments on six real-world hypergraphs reveal that CASH consistently outperforms all competing methods in terms of the accuracy in hyperedge prediction and each of the proposed strategies is effective in improving the model accuracy of CASH. For the detailed information of CASH, we provide the code and datasets at: https://github.com/yy-ko/cash.

CLFeb 23, 2023
KHAN: Knowledge-Aware Hierarchical Attention Networks for Accurate Political Stance Prediction

Yunyong Ko, Seongeun Ryu, Soeun Han et al.

The political stance prediction for news articles has been widely studied to mitigate the echo chamber effect -- people fall into their thoughts and reinforce their pre-existing beliefs. The previous works for the political stance problem focus on (1) identifying political factors that could reflect the political stance of a news article and (2) capturing those factors effectively. Despite their empirical successes, they are not sufficiently justified in terms of how effective their identified factors are in the political stance prediction. Motivated by this, in this work, we conduct a user study to investigate important factors in political stance prediction, and observe that the context and tone of a news article (implicit) and external knowledge for real-world entities appearing in the article (explicit) are important in determining its political stance. Based on this observation, we propose a novel knowledge-aware approach to political stance prediction (KHAN), employing (1) hierarchical attention networks (HAN) to learn the relationships among words and sentences in three different levels and (2) knowledge encoding (KE) to incorporate external knowledge for real-world entities into the process of political stance prediction. Also, to take into account the subtle and important difference between opposite political stances, we build two independent political knowledge graphs (KG) (i.e., KG-lib and KG-con) by ourselves and learn to fuse the different political knowledge. Through extensive evaluations on three real-world datasets, we demonstrate the superiority of DASH in terms of (1) accuracy, (2) efficiency, and (3) effectiveness.

IRDec 15, 2023Code
MONET: Modality-Embracing Graph Convolutional Network and Target-Aware Attention for Multimedia Recommendation

Yungi Kim, Taeri Kim, Won-Yong Shin et al.

In this paper, we focus on multimedia recommender systems using graph convolutional networks (GCNs) where the multimodal features as well as user-item interactions are employed together. Our study aims to exploit multimodal features more effectively in order to accurately capture users' preferences for items. To this end, we point out following two limitations of existing GCN-based multimedia recommender systems: (L1) although multimodal features of interacted items by a user can reveal her preferences on items, existing methods utilize GCN designed to focus only on capturing collaborative signals, resulting in insufficient reflection of the multimodal features in the final user/item embeddings; (L2) although a user decides whether to prefer the target item by considering its multimodal features, existing methods represent her as only a single embedding regardless of the target item's multimodal features and then utilize her embedding to predict her preference for the target item. To address the above issues, we propose a novel multimedia recommender system, named MONET, composed of following two core ideas: modality-embracing GCN (MeGCN) and target-aware attention. Through extensive experiments using four real-world datasets, we demonstrate i) the significant superiority of MONET over seven state-of-the-art competitors (up to 30.32% higher accuracy in terms of recall@20, compared to the best competitor) and ii) the effectiveness of the two core ideas in MONET. All MONET codes are available at https://github.com/Kimyungi/MONET.

IRDec 19, 2023Code
VITA: 'Carefully Chosen and Weighted Less' Is Better in Medication Recommendation

Taeri Kim, Jiho Heo, Hongil Kim et al.

We address the medication recommendation problem, which aims to recommend effective medications for a patient's current visit by utilizing information (e.g., diagnoses and procedures) given at the patient's current and past visits. While there exist a number of recommender systems designed for this problem, we point out that they are challenged in accurately capturing the relation (spec., the degree of relevance) between the current and each of the past visits for the patient when obtaining her current health status, which is the basis for recommending medications. To address this limitation, we propose a novel medication recommendation framework, named VITA, based on the following two novel ideas: (1) relevant-Visit selectIon; (2) Target-aware Attention. Through extensive experiments using real-world datasets, we demonstrate the superiority of VITA (spec., up to 5.56% higher accuracy, in terms of Jaccard, than the best competitor) and the effectiveness of its two core ideas. The code is available at https://github.com/jhheo0123/VITA.

IROct 13, 2023
CROWN: A Novel Approach to Comprehending Users' Preferences for Accurate Personalized News Recommendation

Yunyong Ko, Seongeun Ryu, Sang-Wook Kim

Personalized news recommendation aims to assist users in finding news articles that align with their interests, which plays a pivotal role in mitigating users' information overload problem. Although many recent works have been studied for better personalized news recommendation, the following challenges should be explored more: (C1) Comprehending manifold intents coupled within a news article, (C2) Differentiating varying post-read preferences of news articles, and (C3) Addressing the cold-start user problem. To tackle the aforementioned challenges together, in this paper, we propose a novel personalized news recommendation framework (CROWN) that employs (1) category-guided intent disentanglement for (C1), (2) consistency-based news representation for (C2), and (3) GNN-enhanced hybrid user representation for (C3). Furthermore, we incorporate a category prediction into the training process of CROWN as an auxiliary task, which provides supplementary supervisory signals to enhance intent disentanglement. Extensive experiments on two real-world datasets reveal that (1) CROWN provides consistent performance improvements over ten state-of-the-art news recommendation methods and (2) the proposed strategies significantly improve the accuracy of CROWN.

SIFeb 25, 2024Code
Towards Fair Graph Anomaly Detection: Problem, Benchmark Datasets, and Evaluation

Neng Kai Nigel Neo, Yeon-Chang Lee, Yiqiao Jin et al.

The Fair Graph Anomaly Detection (FairGAD) problem aims to accurately detect anomalous nodes in an input graph while avoiding biased predictions against individuals from sensitive subgroups. However, the current literature does not comprehensively discuss this problem, nor does it provide realistic datasets that encompass actual graph structures, anomaly labels, and sensitive attributes. To bridge this gap, we introduce a formal definition of the FairGAD problem and present two novel datasets constructed from the social media platforms Reddit and Twitter. These datasets comprise 1.2 million and 400,000 edges associated with 9,000 and 47,000 nodes, respectively, and leverage political leanings as sensitive attributes and misinformation spreaders as anomaly labels. We demonstrate that our FairGAD datasets significantly differ from the synthetic datasets used by the research community. Using our datasets, we investigate the performance-fairness trade-off in nine existing GAD and non-graph AD methods on five state-of-the-art fairness methods. Our code and datasets are available at https://github.com/nigelnnk/FairGAD

LGJan 4Code
Accelerating Storage-Based Training for Graph Neural Networks

Myung-Hwan Jang, Jeong-Min Park, Yunyong Ko et al.

Graph neural networks (GNNs) have achieved breakthroughs in various real-world downstream tasks due to their powerful expressiveness. As the scale of real-world graphs has been continuously growing, \textit{a storage-based approach to GNN training} has been studied, which leverages external storage (e.g., NVMe SSDs) to handle such web-scale graphs on a single machine. Although such storage-based GNN training methods have shown promising potential in large-scale GNN training, we observed that they suffer from a severe bottleneck in data preparation since they overlook a critical challenge: \textit{how to handle a large number of small storage I/Os}. To address the challenge, in this paper, we propose a novel storage-based GNN training framework, named \textsf{AGNES}, that employs a method of \textit{block-wise storage I/O processing} to fully utilize the I/O bandwidth of high-performance storage devices. Moreover, to further enhance the efficiency of each storage I/O, \textsf{AGNES} employs a simple yet effective strategy, \textit{hyperbatch-based processing} based on the characteristics of real-world graphs. Comprehensive experiments on five real-world graphs reveal that \textsf{AGNES} consistently outperforms four state-of-the-art methods, by up to 4.1$\times$ faster than the best competitor. Our code is available at https://github.com/Bigdasgit/agnes-kdd26.

SISep 2, 2023Code
Trustworthiness-Driven Graph Convolutional Networks for Signed Network Embedding

Min-Jeong Kim, Yeon-Chang Lee, David Y. Kang et al.

The problem of representing nodes in a signed network as low-dimensional vectors, known as signed network embedding (SNE), has garnered considerable attention in recent years. While several SNE methods based on graph convolutional networks (GCN) have been proposed for this problem, we point out that they significantly rely on the assumption that the decades-old balance theory always holds in the real-world. To address this limitation, we propose a novel GCN-based SNE approach, named as TrustSGCN, which corrects for incorrect embedding propagation in GCN by utilizing the trustworthiness on edge signs for high-order relationships inferred by the balance theory. The proposed approach consists of three modules: (M1) generation of each node's extended ego-network; (M2) measurement of trustworthiness on edge signs; and (M3) trustworthiness-aware propagation of embeddings. Furthermore, TrustSGCN learns the node embeddings by leveraging two well-known societal theories, i.e., balance and status. The experiments on four real-world signed network datasets demonstrate that TrustSGCN consistently outperforms five state-of-the-art GCN-based SNE methods. The code is available at https://github.com/kmj0792/TrustSGCN.

IRNov 14, 2021Code
Linear, or Non-Linear, That is the Question!

Taeyong Kong, Taeri Kim, Jinsung Jeon et al.

There were fierce debates on whether the non-linear embedding propagation of GCNs is appropriate to GCN-based recommender systems. It was recently found that the linear embedding propagation shows better accuracy than the non-linear embedding propagation. Since this phenomenon was discovered especially in recommender systems, it is required that we carefully analyze the linearity and non-linearity issue. In this work, therefore, we revisit the issues of i) which of the linear or non-linear propagation is better and ii) which factors of users/items decide the linearity/non-linearity of the embedding propagation. We propose a novel Hybrid Method of Linear and non-linEar collaborative filTering method (HMLET, pronounced as Hamlet). In our design, there exist both linear and non-linear propagation steps, when processing each user or item node, and our gating module chooses one of them, which results in a hybrid model of the linear and non-linear GCN-based collaborative filtering (CF). The proposed model yields the best accuracy in three public benchmark datasets. Moreover, we classify users/items into the following three classes depending on our gating modules' selections: Full-Non-Linearity (FNL), Partial-Non-Linearity (PNL), and Full-Linearity (FL). We found that there exist strong correlations between nodes' centrality and their class membership, i.e., important user/item nodes exhibit more preferences towards the non-linearity during the propagation steps. To our knowledge, we are the first who design a hybrid method and report the correlation between the graph centrality and the linearity/non-linearity of nodes. All HMLET codes and datasets are available at: https://github.com/qbxlvnf11/HMLET.

HCFeb 28, 2024
HearHere: Mitigating Echo Chambers in News Consumption through an AI-based Web System

Youngseung Jeon, Jaehoon Kim, Sohyun Park et al.

Considerable efforts are currently underway to mitigate the negative impacts of echo chambers, such as increased susceptibility to fake news and resistance towards accepting scientific evidence. Prior research has presented the development of computer systems that support the consumption of news information from diverse political perspectives to mitigate the echo chamber effect. However, existing studies still lack the ability to effectively support the key processes of news information consumption and quantitatively identify a political stance towards the information. In this paper, we present HearHere, an AI-based web system designed to help users accommodate information and opinions from diverse perspectives. HearHere facilitates the key processes of news information consumption through two visualizations. Visualization 1 provides political news with quantitative political stance information, derived from our graph-based political classification model, and users can experience diverse perspectives (Hear). Visualization 2 allows users to express their opinions on specific political issues in a comment form and observe the position of their own opinions relative to pro-liberal and pro-conservative comments presented on a map interface (Here). Through a user study with 94 participants, we demonstrate the feasibility of HearHere in supporting the consumption of information from various perspectives. Our findings highlight the importance of providing political stance information and quantifying users' political status as a means to mitigate political polarization. In addition, we propose design implications for system development, including the consideration of demographics such as political interest and providing users with initiatives.

SIFeb 9, 2025
HyGEN: Regularizing Negative Hyperedge Generation for Accurate Hyperedge Prediction

Song Kyung Yu, Da Eun Lee, Yunyong Ko et al.

Hyperedge prediction is a fundamental task to predict future high-order relations based on the observed network structure. Existing hyperedge prediction methods, however, suffer from the data sparsity problem. To alleviate this problem, negative sampling methods can be used, which leverage non-existing hyperedges as contrastive information for model training. However, the following important challenges have been rarely studied: (C1) lack of guidance for generating negatives and (C2) possibility of producing false negatives. To address them, we propose a novel hyperedge prediction method, HyGEN, that employs (1) a negative hyperedge generator that employs positive hyperedges as a guidance to generate more realistic ones and (2) a regularization term that prevents the generated hyperedges from being false negatives. Extensive experiments on six real-world hypergraphs reveal that HyGEN consistently outperforms four state-of-the-art hyperedge prediction methods.

SIAug 24, 2025
Learning Short-Term and Long-Term Patterns of High-Order Dynamics in Real-World Networks

Yunyong Ko, Da Eun Lee, Song Kyung Yu et al.

Real-world networks have high-order relationships among objects and they evolve over time. To capture such dynamics, many works have been studied in a range of fields. Via an in-depth preliminary analysis, we observe two important characteristics of high-order dynamics in real-world networks: high-order relations tend to (O1) have a structural and temporal influence on other relations in a short term and (O2) periodically re-appear in a long term. In this paper, we propose LINCOLN, a method for Learning hIgh-order dyNamiCs Of reaL-world Networks, that employs (1) bi-interactional hyperedge encoding for short-term patterns, (2) periodic time injection and (3) intermediate node representation for long-term patterns. Via extensive experiments, we show that LINCOLN outperforms nine state-of-the-art methods in the dynamic hyperedge prediction task.

IRAug 18, 2025
Is This News Still Interesting to You?: Lifetime-aware Interest Matching for News Recommendation

Seongeun Ryu, Yunyong Ko, Sang-Wook Kim

Personalized news recommendation aims to deliver news articles aligned with users' interests, serving as a key solution to alleviate the problem of information overload on online news platforms. While prior work has improved interest matching through refined representations of news and users, the following time-related challenges remain underexplored: (C1) leveraging the age of clicked news to infer users' interest persistence, and (C2) modeling the varying lifetime of news across topics and users. To jointly address these challenges, we propose a novel Lifetime-aware Interest Matching framework for nEws recommendation, named LIME, which incorporates three key strategies: (1) User-Topic lifetime-aware age representation to capture the relative age of news with respect to a user-topic pair, (2) Candidate-aware lifetime attention for generating temporally aligned user representation, and (3) Freshness-guided interest refinement for prioritizing valid candidate news at prediction time. Extensive experiments on two real-world datasets demonstrate that LIME consistently outperforms a wide range of state-of-the-art news recommendation methods, and its model agnostic strategies significantly improve recommendation accuracy.

LGJun 26, 2019
Creating A Neural Pedagogical Agent by Jointly Learning to Review and Assess

Youngnam Lee, Youngduck Choi, Junghyun Cho et al.

Machine learning plays an increasing role in intelligent tutoring systems as both the amount of data available and specialization among students grow. Nowadays, these systems are frequently deployed on mobile applications. Users on such mobile education platforms are dynamic, frequently being added, accessing the application with varying levels of focus, and changing while using the service. The education material itself, on the other hand, is often static and is an exhaustible resource whose use in tasks such as problem recommendation must be optimized. The ability to update user models with respect to educational material in real-time is thus essential; however, existing approaches require time-consuming re-training of user features whenever new data is added. In this paper, we introduce a neural pedagogical agent for real-time user modeling in the task of predicting user response correctness, a central task for mobile education applications. Our model, inspired by work in natural language processing on sequence modeling and machine translation, updates user features in real-time via bidirectional recurrent neural networks with an attention mechanism over embedded question-response pairs. We experiment on the mobile education application SantaTOEIC, which has 559k users, 66M response data points as well as a set of 10k study problems each expert-annotated with topic tags and gathered since 2016. Our model outperforms existing approaches over several metrics in predicting user response correctness, notably out-performing other methods on new users without large question-response histories. Additionally, our attention mechanism and annotated tag set allow us to create an interpretable education platform, with a smart review system that addresses the aforementioned issue of varied user attention and problem exhaustion.

IRMay 1, 2019
Clustering-Based Collaborative Filtering Using an Incentivized/Penalized User Model

Cong Tran, Jang-Young Kim, Won-Yong Shin et al.

Giving or recommending appropriate content based on the quality of experience is the most important and challenging issue in recommender systems. As collaborative filtering (CF) is one of the most prominent and popular techniques used for recommender systems, we propose a new clustering-based CF (CBCF) method using an incentivized/penalized user (IPU) model only with ratings given by users, which is thus easy to implement. We aim to design such a simple clustering-based approach with no further prior information while improving the recommendation accuracy. To be precise, the purpose of CBCF with the IPU model is to improve recommendation performance such as precision, recall, and $F_1$ score by carefully exploiting different preferences among users. Specifically, we formulate a constrained optimization problem, in which we aim to maximize the recall (or equivalently $F_1$ score) for a given precision. To this end, users are divided into several clusters based on the actual rating data and Pearson correlation coefficient. Afterwards, we give each item an incentive/penalty according to the preference tendency by users within the same cluster. Our experimental results show a significant performance improvement over the baseline CF scheme without clustering in terms of recall or $F_1$ score for a given precision.