Coarse-to-Fine Open-Set Graph Node Classification with Large Language Models
This addresses the need for deeper insights into OOD samples in high-stake applications like fraud detection and medical diagnosis, though it is incremental as it builds on existing OOD detection methods.
The paper tackles the problem of open-set classification for graph data by extending out-of-distribution (OOD) detection to OOD classification without true labels, proposing a coarse-to-fine framework that uses large language models (LLMs) and graph neural networks (GNNs). It improves OOD detection by 10% over state-of-the-art methods and achieves up to 70% accuracy in OOD classification on graph datasets.
Developing open-set classification methods capable of classifying in-distribution (ID) data while detecting out-of-distribution (OOD) samples is essential for deploying graph neural networks (GNNs) in open-world scenarios. Existing methods typically treat all OOD samples as a single class, despite real-world applications, especially high-stake settings such as fraud detection and medical diagnosis, demanding deeper insights into OOD samples, including their probable labels. This raises a critical question: can OOD detection be extended to OOD classification without true label information? To address this question, we propose a Coarse-to-Fine open-set Classification (CFC) framework that leverages large language models (LLMs) for graph datasets. CFC consists of three key components: a coarse classifier that uses LLM prompts for OOD detection and outlier label generation, a GNN-based fine classifier trained with OOD samples identified by the coarse classifier for enhanced OOD detection and ID classification, and refined OOD classification achieved through LLM prompts and post-processed OOD labels. Unlike methods that rely on synthetic or auxiliary OOD samples, CFC employs semantic OOD instances that are genuinely out-of-distribution based on their inherent meaning, improving interpretability and practical utility. Experimental results show that CFC improves OOD detection by ten percent over state-of-the-art methods on graph and text domains and achieves up to seventy percent accuracy in OOD classification on graph datasets.