LG MLJun 15, 2018

Supervised Fuzzy Partitioning

Pooya Ashtari, Fateme Nateghi Haredasht, Hamid Beigy

arXiv:1806.06124v51.58 citations

Originality Incremental advance

AI Analysis

This provides a flexible, nonlinear model for classification and regression that can use any convex loss function, addressing a gap in applying clustering techniques to supervised learning.

The paper tackles the problem of extending centroid-based clustering methods to supervised tasks by introducing Supervised Fuzzy Partitioning (SFP), which incorporates label information and entropy regularization to achieve competitive predictive performance with state-of-the-art algorithms like SVM and random forest on various datasets.

Centroid-based methods including k-means and fuzzy c-means are known as effective and easy-to-implement approaches to clustering purposes in many applications. However, these algorithms cannot be directly applied to supervised tasks. This paper thus presents a generative model extending the centroid-based clustering approach to be applicable to classification and regression tasks. Given an arbitrary loss function, the proposed approach, termed Supervised Fuzzy Partitioning (SFP), incorporates labels information into its objective function through a surrogate term penalizing the empirical risk. Entropy-based regularization is also employed to fuzzify the partition and to weight features, enabling the method to capture more complex patterns, identify significant features, and yield better performance facing high-dimensional data. An iterative algorithm based on block coordinate descent scheme is formulated to efficiently find a local optimum. Extensive classification experiments on synthetic, real-world, and high-dimensional datasets demonstrate that the predictive performance of SFP is competitive with state-of-the-art algorithms such as SVM and random forest. SFP has a major advantage over such methods, in that it not only leads to a flexible, nonlinear model but also can exploit any convex loss function in the training phase without compromising computational efficiency.

View on arXiv PDF

Similar