Mengting Pan

8.0CRMay 16

Universal Graph Backdoor Defense: A Feature-based Homophily Perspective

Mengting Pan, Fan Li, Chen Chen et al.

Graph neural networks (GNNs) have achieved remarkable success in relational learning. However, their vulnerability to graph backdoor attacks (GBAs) poses a significant barrier to broader adoption in high-stakes applications. Despite recent advances in graph backdoor defense (GBD), existing methods primarily focus on subgraph-based GBAs, relying on the assumption that poisoned target nodes are explicitly connected to subgraph triggers. Our empirical results reveal that such structure-centric approaches fail to defend against emerging feature-based GBAs that preserve graph topology. Therefore, in this paper, we study a novel problem of universal graph backdoor defense. First, we investigate the shared effects of both attack types from a feature-based homophily perspective, which characterizes local feature consistency between nodes and their neighborhoods. Thorough theoretical and empirical analyses demonstrate that, regardless of trigger mechanisms, backdoors induced by GBAs exhibit lower feature-based homophily than clean nodes, indicating a discrepancy in local feature similarity. Motivated by this insight, we propose to leverage node-level local feature consistency, modeled by a neighbor-aware reconstruction loss, to distinguish backdoors from clean nodes. Then, a robust training strategy is developed to eliminate trigger effects while reducing noise induced by detection uncertainty. Extensive experiments demonstrate that our framework significantly degrades the attack success rate and maintains competitive clean accuracy under both subgraph-based and feature-based attacks.

LGAug 5, 2025

HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation

Mengting Pan, Fan Li, Xiaoyang Wang et al.

Contrastive learning (CL) has become a dominant paradigm for self-supervised hypergraph learning, enabling effective training without costly labels. However, node entities in real-world hypergraphs are often associated with rich textual information, which is overlooked in prior works. Directly applying existing CL-based methods to such text-attributed hypergraphs (TAHGs) leads to three key limitations: (1) The common use of graph-agnostic text encoders overlooks the correlations between textual content and hypergraph topology, resulting in suboptimal representations. (2) Their reliance on random data augmentations introduces noise and weakens the contrastive objective. (3) The primary focus on node- and hyperedge-level contrastive signals limits the ability to capture long-range dependencies, which is essential for expressive representation learning. Although HyperBERT pioneers CL on TAHGs, its co-training paradigm suffers from poor scalability. To fill the research gap, we introduce HiTeC, a two-stage hierarchical contrastive learning framework with semantic-aware augmentation for scalable and effective self-supervised learning on TAHGs. In the first stage, we pre-train the text encoder with a structure-aware contrastive objective to overcome the graph-agnostic nature of conventional methods. In the second stage, we introduce two semantic-aware augmentation strategies, including prompt-enhanced text augmentation and semantic-aware hyperedge drop, to facilitate informative view generation. Furthermore, we propose a multi-scale contrastive loss that extends existing objectives with an $s$-walk-based subgraph-level contrast to better capture long-range dependencies. By decoupling text encoder pretraining from hypergraph contrastive learning, this two-stage design enhances scalability without compromising representation quality. Extensive experiments confirm the effectiveness of HiTeC.

Mengting Pan

2 Papers