LG MLNov 21, 2025

Semi-Supervised Federated Multi-Label Feature Selection with Fuzzy Information Measures

arXiv:2511.17796v1

Originality Incremental advance

AI Analysis

This addresses the challenge of feature selection for distributed multi-label learning when data is unlabeled on clients, though it is incremental by adapting existing techniques to a federated setting.

The paper tackles the problem of multi-label feature selection in federated environments where clients have unlabeled data, proposing SSFMLFS which uses fuzzy information theory and PageRank to rank features, and it outperforms other methods on five real-world datasets in non-IID settings.

Multi-label feature selection (FS) reduces the dimensionality of multi-label data by removing irrelevant, noisy, and redundant features, thereby boosting the performance of multi-label learning models. However, existing methods typically require centralized data, which makes them unsuitable for distributed and federated environments where each device/client holds its own local dataset. Additionally, federated methods often assume that clients have labeled data, which is unrealistic in cases where clients lack the expertise or resources to label task-specific data. To address these challenges, we propose a Semi-Supervised Federated Multi-Label Feature Selection method, called SSFMLFS, where clients hold only unlabeled data, while the server has limited labeled data. SSFMLFS adapts fuzzy information theory to a federated setting, where clients compute fuzzy similarity matrices and transmit them to the server, which then calculates feature redundancy and feature-label relevancy degrees. A feature graph is constructed by modeling features as vertices, assigning relevancy and redundancy degrees as vertex weights and edge weights, respectively. PageRank is then applied to rank the features by importance. Extensive experiments on five real-world datasets from various domains, including biology, images, music, and text, demonstrate that SSFMLFS outperforms other federated and centralized supervised and semi-supervised approaches in terms of three different evaluation metrics in non-IID data distribution setting.

View on arXiv PDF

Similar