AIDec 6, 2022
Fuzzy Rough Sets Based on Fuzzy QuantificationAdnan Theerens, Chris Cornelis
One of the weaknesses of classical (fuzzy) rough sets is their sensitivity to noise, which is particularly undesirable for machine learning applications. One approach to solve this issue is by making use of fuzzy quantifiers, as done by the vaguely quantified fuzzy rough set (VQFRS) model. While this idea is intuitive, the VQFRS model suffers from both theoretical flaws as well as from suboptimal performance in applications. In this paper, we improve on VQFRS by introducing fuzzy quantifier-based fuzzy rough sets (FQFRS), an intuitive generalization of fuzzy rough sets that makes use of general unary and binary quantification models. We show how several existing models fit in this generalization as well as how it inspires novel ones. Several binary quantification models are proposed to be used with FQFRS. We conduct a theoretical study of their properties, and investigate their potential by applying them to classification problems. In particular, we highlight Yager's Weighted Implication-based (YWI) binary quantification model, which induces a fuzzy rough set model that is both a significant improvement on VQFRS, as well as a worthy competitor to the popular ordered weighted averaging based fuzzy rough set (OWAFRS) model.
LGJun 28, 2022
No imputation without representationOliver Urs Lenz, Daniel Peralta, Chris Cornelis
By filling in missing values in datasets, imputation allows these datasets to be used with algorithms that cannot handle missing values by themselves. However, missing values may in principle contribute useful information that is lost through imputation. The missing-indicator approach can be used in combination with imputation to instead represent this information as a part of the dataset. There are several theoretical considerations why missing-indicators may or may not be beneficial, but there has not been any large-scale practical experiment on real-life datasets to test this question for machine learning predictions. We perform this experiment for three imputation strategies and a range of different classification algorithms, on the basis of twenty real-life datasets. In a follow-up experiment, we determine attribute-specific missingness thresholds for each classifier above which missing-indicators are more likely than not to increase classification performance. And in a second follow-up experiment, we evaluate numerical imputation of one-hot encoded categorical attributes. We reach the following conclusions. Firstly, missing-indicators generally increase classification performance. Secondly, with missing-indicators, nearest neighbour and iterative imputation do not lead to better performance than simple mean/mode imputation. Thirdly, for decision trees, pruning is necessary to prevent overfitting. Fourthly, the thresholds above which missing-indicators are more likely than not to improve performance are lower for categorical attributes than for numerical attributes. Lastly, mean imputation of numerical attributes preserves some of the information from missing values. Consequently, when not using missing-indicators it can be advantageous to apply mean imputation to one-hot encoded categorical attributes instead of mode imputation.
LGSep 25, 2023
Classifying token frequencies using angular Minkowski $p$-distanceOliver Urs Lenz, Chris Cornelis
Angular Minkowski $p$-distance is a dissimilarity measure that is obtained by replacing Euclidean distance in the definition of cosine dissimilarity with other Minkowski $p$-distances. Cosine dissimilarity is frequently used with datasets containing token frequencies, and angular Minkowski $p$-distance may potentially be an even better choice for certain tasks. In a case study based on the 20-newsgroups dataset, we evaluate clasification performance for classical weighted nearest neighbours, as well as fuzzy rough nearest neighbours. In addition, we analyse the relationship between the hyperparameter $p$, the dimensionality $m$ of the dataset, the number of neighbours $k$, the choice of weights and the choice of classifier. We conclude that it is possible to obtain substantially higher classification performance with angular Minkowski $p$-distance with suitable values for $p$ than with classical cosine dissimilarity.
LGOct 4, 2022
Polar Encoding: A Simple Baseline Approach for Classification with Missing ValuesOliver Urs Lenz, Daniel Peralta, Chris Cornelis
We propose polar encoding, a representation of categorical and numerical $[0,1]$-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and $[0,1]$-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and -- depending on the classifier -- about as well or better than mean/mode imputation with missing-indicators.
AIJun 2, 2022
Fuzzy granular approximation classifierMarko Palangetić, Chris Cornelis, Salvatore Greco et al.
In this article, a new Fuzzy Granular Approximation Classifier (FGAC) is introduced. The classifier is based on the previously introduced concept of the granular approximation and its multi-class classification case. The classifier is instance-based and its biggest advantage is its local transparency i.e., the ability to explain every individual prediction it makes. We first develop the FGAC for the binary classification case and the multi-class classification case and we discuss its variation that includes the Ordered Weighted Average (OWA) operators. Those variations of the FGAC are then empirically compared with other locally transparent ML methods. At the end, we discuss the transparency of the FGAC and its advantage over other locally transparent methods. We conclude that while the FGAC has similar predictive performance to other locally transparent ML models, its transparency can be superior in certain cases.
LGNov 25, 2022
Evaluation of the impact of the indiscernibility relation on the fuzzy-rough nearest neighbours algorithmHenri Bollaert, Chris Cornelis
Fuzzy rough sets are well-suited for working with vague, imprecise or uncertain information and have been succesfully applied in real-world classification problems. One of the prominent representatives of this theory is fuzzy-rough nearest neighbours (FRNN), a classification algorithm based on the classical k-nearest neighbours algorithm. The crux of FRNN is the indiscernibility relation, which measures how similar two elements in the data set of interest are. In this paper, we investigate the impact of this indiscernibility relation on the performance of FRNN classification. In addition to relations based on distance functions and kernels, we also explore the effect of distance metric learning on FRNN for the first time. Furthermore, we also introduce an asymmetric, class-specific relation based on the Mahalanobis distance which uses the correlation within each class, and which shows a significant improvement over the regular Mahalanobis distance, but is still beaten by the Manhattan distance. Overall, the Neighbourhood Components Analysis algorithm is found to be the best performer, trading speed for accuracy.
LGNov 28, 2023
A unified weighting framework for evaluating nearest neighbour classificationOliver Urs Lenz, Henri Bollaert, Chris Cornelis
We present the first comprehensive and large-scale evaluation of classical (NN), fuzzy (FNN) and fuzzy rough (FRNN) nearest neighbour classification. We standardise existing proposals for nearest neighbour weighting with kernel functions, applied to the distance values and/or ranks of the nearest neighbours of a test instance. In particular, we show that the theoretically optimal Samworth weights converge to a kernel. Kernel functions are closely related to fuzzy negation operators, and we propose a new kernel based on Yager negation. We also consider various distance and scaling measures, which we show can be related to each other. Through a systematic series of experiments on 85 real-life classification datasets, we find that NN, FNN and FRNN all perform best with Boscovich distance, and that NN and FRNN perform best with a combination of Samworth rank- and distance-weights and scaling by the mean absolute deviation around the median ($r_1$), the standard deviation ($r_2$) or the semi-interquartile range ($r_{\infty}^*$), while FNN performs best with only Samworth distance-weights and $r_1$- or $r_2$-scaling. However, NN achieves comparable performance with Yager-$\frac{1}{2}$ distance-weights, which are simpler to implement than a combination of Samworth distance- and rank-weights. Finally, FRNN generally outperforms NN, which in turn performs systematically better than FNN.
AIDec 27, 2023
On the Granular Representation of Fuzzy Quantifier-Based Fuzzy Rough SetsAdnan Theerens, Chris Cornelis
Rough set theory is a well-known mathematical framework that can deal with inconsistent data by providing lower and upper approximations of concepts. A prominent property of these approximations is their granular representation: that is, they can be written as unions of simple sets, called granules. The latter can be identified with "if. . . , then. . . " rules, which form the backbone of rough set rule induction. It has been shown previously that this property can be maintained for various fuzzy rough set models, including those based on ordered weighted average (OWA) operators. In this paper, we will focus on some instances of the general class of fuzzy quantifier-based fuzzy rough sets (FQFRS). In these models, the lower and upper approximations are evaluated using binary and unary fuzzy quantifiers, respectively. One of the main targets of this study is to examine the granular representation of different models of FQFRS. The main findings reveal that Choquet-based fuzzy rough sets can be represented granularly under the same conditions as OWA-based fuzzy rough sets, whereas Sugeno-based FRS can always be represented granularly. This observation highlights the potential of these models for resolving data inconsistencies and managing noise.
LGMar 7, 2024
FRRI: a novel algorithm for fuzzy-rough rule inductionHenri Bollaert, Marko Palangetić, Chris Cornelis et al.
Interpretability is the next frontier in machine learning research. In the search for white box models - as opposed to black box models, like random forests or neural networks - rule induction algorithms are a logical and promising option, since the rules can easily be understood by humans. Fuzzy and rough set theory have been successfully applied to this archetype, almost always separately. As both approaches to rule induction involve granular computing based on the concept of equivalence classes, it is natural to combine them. The QuickRules\cite{JensenCornelis2009} algorithm was a first attempt at using fuzzy rough set theory for rule induction. It is based on QuickReduct, a greedy algorithm for building decision reducts. QuickRules already showed an improvement over other rule induction methods. However, to evaluate the full potential of a fuzzy rough rule induction algorithm, one needs to start from the foundations. In this paper, we introduce a novel rule induction algorithm called Fuzzy Rough Rule Induction (FRRI). We provide background and explain the workings of our algorithm. Furthermore, we perform a computational experiment to evaluate the performance of our algorithm and compare it to other state-of-the-art rule induction approaches. We find that our algorithm is more accurate while creating small rulesets consisting of relatively short rules. We end the paper by outlining some directions for future work.
AIJun 3, 2025
Optimising the attribute order in Fuzzy Rough Rule InductionHenri Bollaert, Chris Cornelis, Marko Palangetić et al.
Interpretability is the next pivotal frontier in machine learning research. In the pursuit of glass box models - as opposed to black box models, like random forests or neural networks - rule induction algorithms are a logical and promising avenue, as the rules can easily be understood by humans. In our previous work, we introduced FRRI, a novel rule induction algorithm based on fuzzy rough set theory. We demonstrated experimentally that FRRI outperformed other rule induction methods with regards to accuracy and number of rules. FRRI leverages a fuzzy indiscernibility relation to partition the data space into fuzzy granules, which are then combined into a minimal covering set of rules. This indiscernibility relation is constructed by removing attributes from rules in a greedy way. This raises the question: does the order of the attributes matter? In this paper, we show that optimising only the order of attributes using known methods from fuzzy rough set theory and classical machine learning does not improve the performance of FRRI on multiple metrics. However, removing a small number of attributes using fuzzy rough feature selection during this step positively affects balanced accuracy and the average rule length.
LGApr 1, 2025
Feature Subset Weighting for Distance-based Supervised Learning through Choquet IntegrationAdnan Theerens, Yvan Saeys, Chris Cornelis
This paper introduces feature subset weighting using monotone measures for distance-based supervised learning. The Choquet integral is used to define a distance metric that incorporates these weights. This integration enables the proposed distances to effectively capture non-linear relationships and account for interactions both between conditional and decision attributes and among conditional attributes themselves, resulting in a more flexible distance measure. In particular, we show how this approach ensures that the distances remain unaffected by the addition of duplicate and strongly correlated features. Another key point of this approach is that it makes feature subset weighting computationally feasible, since only $m$ feature subset weights should be calculated each time instead of calculating all feature subset weights ($2^m$), where $m$ is the number of attributes. Next, we also examine how the use of the Choquet integral for measuring similarity leads to a non-equivalent definition of distance. The relationship between distance and similarity is further explored through dual measures. Additionally, symmetric Choquet distances and similarities are proposed, preserving the classical symmetry between similarity and distance. Finally, we introduce a concrete feature subset weighting distance, evaluate its performance in a $k$-nearest neighbors (KNN) classification setting, and compare it against Mahalanobis distances and weighted distance methods.
LGMar 18, 2024
Fuzzy Rough Choquet Distances for ClassificationAdnan Theerens, Chris Cornelis
This paper introduces a novel Choquet distance using fuzzy rough set based measures. The proposed distance measure combines the attribute information received from fuzzy rough set theory with the flexibility of the Choquet integral. This approach is designed to adeptly capture non-linear relationships within the data, acknowledging the interplay of the conditional attributes towards the decision attribute and resulting in a more flexible and accurate distance. We explore its application in the context of machine learning, with a specific emphasis on distance-based classification approaches (e.g. k-nearest neighbours). The paper examines two fuzzy rough set based measures that are based on the positive region. Moreover, we explore two procedures for monotonizing the measures derived from fuzzy rough set theory, making them suitable for use with the Choquet integral, and investigate their differences.
LGFeb 22, 2022
Choquet-Based Fuzzy Rough SetsAdnan Theerens, Oliver Urs Lenz, Chris Cornelis
Fuzzy rough set theory can be used as a tool for dealing with inconsistent data when there is a gradual notion of indiscernibility between objects. It does this by providing lower and upper approximations of concepts. In classical fuzzy rough sets, the lower and upper approximations are determined using the minimum and maximum operators, respectively. This is undesirable for machine learning applications, since it makes these approximations sensitive to outlying samples. To mitigate this problem, ordered weighted average (OWA) based fuzzy rough sets were introduced. In this paper, we show how the OWA-based approach can be interpreted intuitively in terms of vague quantification, and then generalize it to Choquet-based fuzzy rough sets (CFRS). This generalization maintains desirable theoretical properties, such as duality and monotonicity. Furthermore, it provides more flexibility for machine learning applications. In particular, we show that it enables the seamless integration of outlier detection algorithms, to enhance the robustness of machine learning algorithms based on fuzzy rough sets.
AIFeb 15, 2022
Multi-class granular approximation by means of disjoint and adjacent fuzzy granulesMarko Palangetić, Chris Cornelis, Salvatore Greco et al.
In granular computing, fuzzy sets can be approximated by granularly representable sets that are as close as possible to the original fuzzy set w.r.t. a given closeness measure. Such sets are called granular approximations. In this article, we introduce the concepts of disjoint and adjacent granules and we examine how the new definitions affect the granular approximations. First, we show that the new concepts are important for binary classification problems since they help to keep decision regions separated (disjoint granules) and at the same time to cover as much as possible of the attribute space (adjacent granules). Later, we consider granular approximations for multi-class classification problems leading to the definition of a multi-class granular approximation. Finally, we show how to efficiently calculate multi-class granular approximations for Łukasiewicz fuzzy connectives. We also provide graphical illustrations for a better understanding of the introduced concepts.
AINov 26, 2021
A Novel Machine Learning Approach to Data Inconsistency with respect to a Fuzzy RelationMarko Palangetić, Chris Cornelis, Salvatore Greco et al.
Inconsistency in prediction problems occurs when instances that relate in a certain way on condition attributes, do not follow the same relation on the decision attribute. For example, in ordinal classification with monotonicity constraints, it occurs when an instance dominating another instance on condition attributes has been assigned to a worse decision class. It typically appears as a result of perturbation in data caused by incomplete knowledge (missing attributes) or by random effects that occur during data generation (instability in the assessment of decision attribute values). Inconsistencies with respect to a crisp preorder relation (expressing either dominance or indiscernibility between instances) can be handled using symbolic approaches like rough set theory and by using statistical/machine learning approaches that involve optimization methods. Fuzzy rough sets can also be seen as a symbolic approach to inconsistency handling with respect to a fuzzy relation. In this article, we introduce a new machine learning method for inconsistency handling with respect to a fuzzy preorder relation. The novel approach is motivated by the existing machine learning approach used for crisp relations. We provide statistical foundations for it and develop optimization procedures that can be used to eliminate inconsistencies. The article also proves important properties and contains didactic examples of those procedures.
CLJul 8, 2021
Nearest neighbour approaches for Emotion Detection in TweetsOlha Kaminska, Chris Cornelis, Veronique Hoste
Emotion detection is an important task that can be applied to social media data to discover new knowledge. While the use of deep learning methods for this task has been prevalent, they are black-box models, making their decisions hard to interpret for a human operator. Therefore, in this paper, we propose an approach using weighted $k$ Nearest Neighbours (kNN), a simple, easy to implement, and explainable machine learning model. These qualities can help to enhance results' reliability and guide error analysis. In particular, we apply the weighted kNN model to the shared emotion detection task in tweets from SemEval-2018. Tweets are represented using different text embedding methods and emotion lexicon vocabulary scores, and classification is done by an ensemble of weighted kNN models. Our best approaches obtain results competitive with state-of-the-art solutions and open up a promising alternative path to neural network methods.
CLJul 8, 2021
Fuzzy-Rough Nearest Neighbour Approaches for Emotion Detection in TweetsOlha Kaminska, Chris Cornelis, Veronique Hoste
Social media are an essential source of meaningful data that can be used in different tasks such as sentiment analysis and emotion recognition. Mostly, these tasks are solved with deep learning methods. Due to the fuzzy nature of textual data, we consider using classification methods based on fuzzy rough sets. Specifically, we develop an approach for the SemEval-2018 emotion detection task, based on the fuzzy rough nearest neighbour (FRNN) classifier enhanced with ordered weighted average (OWA) operators. We use tuned ensembles of FRNN--OWA models based on different text embedding methods. Our results are competitive with the best SemEval solutions based on more complicated deep learning methods.
LGFeb 4, 2021
Optimised one-class classification performanceOliver Urs Lenz, Daniel Peralta, Chris Cornelis
We provide a thorough treatment of one-class classification with hyperparameter optimisation for five data descriptors: Support Vector Machine (SVM), Nearest Neighbour Distance (NND), Localised Nearest Neighbour Distance (LNND), Local Outlier Factor (LOF) and Average Localised Proximity (ALP). The hyperparameters of SVM and LOF have to be optimised through cross-validation, while NND, LNND and ALP allow an efficient form of leave-one-out validation and the reuse of a single nearest-neighbour query. We experimentally evaluate the effect of hyperparameter optimisation with 246 classification problems drawn from 50 datasets. From a selection of optimisation algorithms, the recent Malherbe-Powell proposal optimises the hyperparameters of all data descriptors most efficiently. We calculate the increase in test AUROC and the amount of overfitting as a function of the number of hyperparameter evaluations. After 50 evaluations, ALP and SVM significantly outperform LOF, NND and LNND, and LOF and NND outperform LNND. The performance of ALP and SVM is comparable, but ALP can be optimised more efficiently so constitutes a good default choice. Alternatively, using validation AUROC as a selection criterion between ALP or SVM gives the best overall result, and NND is the least computationally demanding option. We thus end up with a clear trade-off between three choices, allowing practitioners to make an informed decision.
LGJan 26, 2021
Average Localised Proximity: A new data descriptor with good default one-class classification performanceOliver Urs Lenz, Daniel Peralta, Chris Cornelis
One-class classification is a challenging subfield of machine learning in which so-called data descriptors are used to predict membership of a class based solely on positive examples of that class, and no counter-examples. A number of data descriptors that have been shown to perform well in previous studies of one-class classification, like the Support Vector Machine (SVM), require setting one or more hyperparameters. There has been no systematic attempt to date to determine optimal default values for these hyperparameters, which limits their ease of use, especially in comparison with hyperparameter-free proposals like the Isolation Forest (IF). We address this issue by determining optimal default hyperparameter values across a collection of 246 one-class classification problems derived from 50 different real-world datasets. In addition, we propose a new data descriptor, Average Localised Proximity (ALP) to address certain issues with existing approaches based on nearest neighbour distances. Finally, we evaluate classification performance using a leave-one-dataset-out procedure, and find strong evidence that ALP outperforms IF and a number of other data descriptors, as well as weak evidence that it outperforms SVM, making ALP a good default choice.