AINov 11, 2025

Combining LLM Semantic Reasoning with GNN Structural Modeling for Multi-View Multi-Label Feature Selection

Zhiqi Chen, Yuzhou Liu, Jiarui Liu, Wanfu Gao

arXiv:2511.08008v25.81 citations

Originality Incremental advance

AI Analysis

This addresses feature selection for high-dimensional, multimodal data in domains like social media and bioinformatics, representing an incremental improvement by integrating semantic analysis into existing frameworks.

The paper tackles multi-view multi-label feature selection by combining LLM semantic reasoning with GNN structural modeling to jointly leverage semantic and statistical information, achieving superior performance over state-of-the-art baselines on multiple benchmark datasets with demonstrated robustness and generalization.

Multi-view multi-label feature selection aims to identify informative features from heterogeneous views, where each sample is associated with multiple interdependent labels. This problem is particularly important in machine learning involving high-dimensional, multimodal data such as social media, bioinformatics or recommendation systems. Existing Multi-View Multi-Label Feature Selection (MVMLFS) methods mainly focus on analyzing statistical information of data, but seldom consider semantic information. In this paper, we aim to use these two types of information jointly and propose a method that combines Large Language Models (LLMs) semantic reasoning with Graph Neural Networks (GNNs) structural modeling for MVMLFS. Specifically, the method consists of three main components. (1) LLM is first used as an evaluation agent to assess the latent semantic relevance among feature, view, and label descriptions. (2) A semantic-aware heterogeneous graph with two levels is designed to represent relations among features, views and labels: one is a semantic graph representing semantic relations, and the other is a statistical graph. (3) A lightweight Graph Attention Network (GAT) is applied to learn node embedding in the heterogeneous graph as feature saliency scores for ranking and selection. Experimental results on multiple benchmark datasets demonstrate the superiority of our method over state-of-the-art baselines, and it is still effective when applied to small-scale datasets, showcasing its robustness, flexibility, and generalization ability.

View on arXiv PDF

Similar