LGJul 19, 2022Code
Heterogeneous Treatment Effect with Trained Kernels of the Nadaraya-Watson RegressionAndrei V. Konstantinov, Stanislav R. Kirpichenko, Lev V. Utkin
A new method for estimating the conditional average treatment effect is proposed in the paper. It is called TNW-CATE (the Trainable Nadaraya-Watson regression for CATE) and based on the assumption that the number of controls is rather large whereas the number of treatments is small. TNW-CATE uses the Nadaraya-Watson regression for predicting outcomes of patients from the control and treatment groups. The main idea behind TNW-CATE is to train kernels of the Nadaraya-Watson regression by using a weight sharing neural network of a specific form. The network is trained on controls, and it replaces standard kernels with a set of neural subnetworks with shared parameters such that every subnetwork implements the trainable kernel, but the whole network implements the Nadaraya-Watson estimator. The network memorizes how the feature vectors are located in the feature space. The proposed approach is similar to the transfer learning when domains of source and target data are similar, but tasks are different. Various numerical simulation experiments illustrate TNW-CATE and compare it with the well-known T-learner, S-learner and X-learner for several types of the control and treatment outcome functions. The code of proposed algorithms implementing TNW-CATE is available in https://github.com/Stasychbr/TNW-CATE.
LGAug 7, 2023Code
SurvBeX: An explanation method of the machine learning survival models based on the Beran estimatorLev V. Utkin, Danila Y. Eremenko, Andrei V. Konstantinov
An explanation method called SurvBeX is proposed to interpret predictions of the machine learning survival black-box models. The main idea behind the method is to use the modified Beran estimator as the surrogate explanation model. Coefficients, incorporated into Beran estimator, can be regarded as values of the feature impacts on the black-box model prediction. Following the well-known LIME method, many points are generated in a local area around an example of interest. For every generated example, the survival function of the black-box model is computed, and the survival function of the surrogate model (the Beran estimator) is constructed as a function of the explanation coefficients. In order to find the explanation coefficients, it is proposed to minimize the mean distance between the survival functions of the black-box model and the Beran estimator produced by the generated examples. Many numerical experiments with synthetic and real survival data demonstrate the SurvBeX efficiency and compare the method with the well-known method SurvLIME. The method is also compared with the method SurvSHAP. The code implementing SurvBeX is available at: https://github.com/DanilaEremenko/SurvBeX
LGNov 19, 2022Code
BENK: The Beran Estimator with Neural Kernels for Estimating the Heterogeneous Treatment EffectStanislav R. Kirpichenko, Lev V. Utkin, Andrei V. Konstantinov
A method for estimating the conditional average treatment effect under condition of censored time-to-event data called BENK (the Beran Estimator with Neural Kernels) is proposed. The main idea behind the method is to apply the Beran estimator for estimating the survival functions of controls and treatments. Instead of typical kernel functions in the Beran estimator, it is proposed to implement kernels in the form of neural networks of a specific form called the neural kernels. The conditional average treatment effect is estimated by using the survival functions as outcomes of the control and treatment neural networks which consists of a set of neural kernels with shared parameters. The neural kernels are more flexible and can accurately model a complex location structure of feature vectors. Various numerical simulation experiments illustrate BENK and compare it with the well-known T-learner, S-learner and X-learner for several types of the control and treatment outcome functions based on the Cox models, the random survival forest and the Nadaraya-Watson regression with Gaussian kernels. The code of proposed algorithms implementing BENK is available in https://github.com/Stasychbr/BENK.
LGOct 11, 2022Code
LARF: Two-level Attention-based Random Forests with a Mixture of Contamination ModelsAndrei V. Konstantinov, Lev V. Utkin
New models of the attention-based random forests called LARF (Leaf Attention-based Random Forest) are proposed. The first idea behind the models is to introduce a two-level attention, where one of the levels is the "leaf" attention and the attention mechanism is applied to every leaf of trees. The second level is the tree attention depending on the "leaf" attention. The second idea is to replace the softmax operation in the attention with the weighted sum of the softmax operations with different parameters. It is implemented by applying a mixture of the Huber's contamination models and can be regarded as an analog of the multi-head attention with "heads" defined by selecting a value of the softmax parameter. Attention parameters are simply trained by solving the quadratic optimization problem. To simplify the tuning process of the models, it is proposed to make the tuning contamination parameters to be training and to compute them by solving the quadratic optimization problem. Many numerical experiments with real datasets are performed for studying LARFs. The code of proposed algorithms can be found in https://github.com/andruekonst/leaf-attention-forest.
LGJul 19, 2023
A New Computationally Simple Approach for Implementing Neural Networks with Output Hard ConstraintsAndrei V. Konstantinov, Lev V. Utkin
A new computationally simple method of imposing hard convex constraints on the neural network output values is proposed. The key idea behind the method is to map a vector of hidden parameters of the network to a point that is guaranteed to be inside the feasible set defined by a set of constraints. The mapping is implemented by the additional neural network layer with constraints for output. The proposed method is simply extended to the case when constraints are imposed not only on the output vectors, but also on joint constraints depending on inputs. The projection approach to imposing constraints on outputs can simply be implemented in the framework of the proposed method. It is shown how to incorporate different types of constraints into the proposed method, including linear and quadratic constraints, equality constraints, and dynamic constraints, constraints in the form of boundaries. An important feature of the method is its computational simplicity. Complexities of the forward pass of the proposed neural network layer by linear and quadratic constraints are O(n*m) and O(n^2*m), respectively, where n is the number of variables, m is the number of constraints. Numerical experiments illustrate the method by solving optimization and classification problems. The code implementing the method is publicly available.
LGOct 5, 2022
Improved Anomaly Detection by Using the Attention-Based Isolation ForestLev V. Utkin, Andrey Y. Ageev, Andrei V. Konstantinov
A new modification of Isolation Forest called Attention-Based Isolation Forest (ABIForest) for solving the anomaly detection problem is proposed. It incorporates the attention mechanism in the form of the Nadaraya-Watson regression into the Isolation Forest for improving solution of the anomaly detection problem. The main idea underlying the modification is to assign attention weights to each path of trees with learnable parameters depending on instances and trees themselves. The Huber's contamination model is proposed to be used for defining the attention weights and their parameters. As a result, the attention weights are linearly depend on the learnable attention parameters which are trained by solving the standard linear or quadratic optimization problem. ABIForest can be viewed as the first modification of Isolation Forest, which incorporates the attention mechanism in a simple way without applying gradient-based algorithms. Numerical experiments with synthetic and real datasets illustrate outperforming results of ABIForest. The code of proposed algorithms is available.
LGJul 9, 2022
Attention and Self-Attention in Random ForestsLev V. Utkin, Andrei V. Konstantinov
New models of random forests jointly using the attention and self-attention mechanisms are proposed for solving the regression problem. The models can be regarded as extensions of the attention-based random forest whose idea stems from applying a combination of the Nadaraya-Watson kernel regression and the Huber's contamination model to random forests. The self-attention aims to capture dependencies of the tree predictions and to remove noise or anomalous predictions in the random forest. The self-attention module is trained jointly with the attention module for computing weights. It is shown that the training process of attention weights is reduced to solving a single quadratic or linear optimization problem. Three modifications of the general approach are proposed and compared. A specific multi-head self-attention for the random forest is also considered. Heads of the self-attention are obtained by changing its tuning parameters including the kernel parameters and the contamination parameter of models. Numerical experiments with various datasets illustrate the proposed models and show that the supplement of the self-attention improves the model performance for many datasets.
LGApr 12, 2023
Neural Attention Forests: Transformer-Based Forest ImprovementAndrei V. Konstantinov, Lev V. Utkin, Alexey A. Lukashin et al.
A new approach called NAF (the Neural Attention Forest) for solving regression and classification tasks under tabular training data is proposed. The main idea behind the proposed NAF model is to introduce the attention mechanism into the random forest by assigning attention weights calculated by neural networks of a specific form to data in leaves of decision trees and to the random forest itself in the framework of the Nadaraya-Watson kernel regression. In contrast to the available models like the attention-based random forest, the attention weights and the Nadaraya-Watson regression are represented in the form of neural networks whose weights can be regarded as trainable parameters. The first part of neural networks with shared weights is trained for all trees and computes attention weights of data in leaves. The second part aggregates outputs of the tree networks and aims to minimize the difference between the random forest prediction and the truth target value from a training set. The neural network is trained in an end-to-end manner. The combination of the random forest and neural networks implementing the attention mechanism forms a transformer for enhancing the forest predictions. Numerical experiments with real datasets illustrate the proposed method. The code implementing the approach is publicly available.
LGMar 15, 2023
Interpretable Ensembles of Hyper-Rectangles as Base ModelsAndrei V. Konstantinov, Lev V. Utkin
A new extremely simple ensemble-based model with the uniformly generated axis-parallel hyper-rectangles as base models (HRBM) is proposed. Two types of HRBMs are studied: closed rectangles and corners. The main idea behind HRBM is to consider and count training examples inside and outside each rectangle. It is proposed to incorporate HRBMs into the gradient boosting machine (GBM). Despite simplicity of HRBMs, it turns out that these simple base models allow us to construct effective ensemble-based models and avoid overfitting. A simple method for calculating optimal regularization parameters of the ensemble-based model, which can be modified in the explicit way at each iteration of GBM, is considered. Moreover, a new regularization called the "step height penalty" is studied in addition to the standard L1 and L2 regularizations. An extremely simple approach to the proposed ensemble-based model prediction interpretation by using the well-known method SHAP is proposed. It is shown that GBM with HRBM can be regarded as a model extending a set of interpretable models for explaining black-box models. Numerical experiments with real datasets illustrate the proposed GBM with HRBMs for regression and classification problems. Experiments also illustrate computational efficiency of the proposed SHAP modifications. The code of proposed algorithms implementing GBM with HRBM is publicly available.
LGFeb 13, 2023
Multiple Instance Learning with Trainable Decision Tree EnsemblesAndrei V. Konstantinov, Lev V. Utkin
A new random forest based model for solving the Multiple Instance Learning (MIL) problem under small tabular data, called Soft Tree Ensemble MIL (STE-MIL), is proposed. A new type of soft decision trees is considered, which is similar to the well-known soft oblique trees, but with a smaller number of trainable parameters. In order to train the trees, it is proposed to convert them into neural networks of a specific form, which approximate the tree functions. It is also proposed to aggregate the instance and bag embeddings (output vectors) by using the attention mechanism. The whole STE-MIL model, including soft decision trees, neural networks, the attention mechanism and a classifier, is trained in an end-to-end manner. Numerical experiments with tabular datasets illustrate STE-MIL. The corresponding code implementing the model is publicly available.
LGMay 21
SDPM: Survival Diffusion Probabilistic Model for Continuous-Time Survival AnalysisStanislav R. Kirpichenko, Andrei V. Konstantinov, Lev V. Utkin
Survival analysis aims to estimate a time-to-event distribution from data with censored observations. Many existing methods either impose structural assumptions on the hazard function or discretize the time axis, which may limit flexibility and introduce approximation errors. We propose the Survival Diffusion Probabilistic Model (SDPM), a generative approach to continuous-time survival analysis. SDPM models the conditional distribution of the survival outcome, represented by the pair of observed time and censoring indicator, $\mathbb{P}(T,δ\mid \mathbf{x})$, using a denoising diffusion model. Under the assumption of conditionally independent censoring, conditional samples generated by the model can be transformed into survival function estimates using the Kaplan-Meier estimator. This formulation avoids parametric assumptions on the event-time distribution and does not require a discretization of the output time space. The model operates in a transformed target space, using standardized log-times and a continuous Gaussian-mixture representation of the censoring indicator. We evaluate SDPM on ten real survival datasets and compare it with five strong baselines, including tree-based, boosting-based, and neural survival models. Results show that SDPM achieves competitive predictive performance across C-index, integrated time-dependent AUC, and integrated Brier score. A study on synthetic Cox-Weibull data demonstrates that SDPM can recover the shape of an underlying continuous survival distribution more accurately than a strong nonparametric baseline when sufficiently many samples are generated. An ablation study confirms the importance of the proposed target-space transformations, which improve event-rate calibration, reduce invalid generated times, and provide consistent gains in predictive discrimination. Codes implementing the proposed model are publicly available.
LGDec 8, 2025
Towards a Relationship-Aware Transformer for Tabular DataAndrei V. Konstantinov, Valerii A. Zuev, Lev V. Utkin
Deep learning models for tabular data typically do not allow for imposing a graph of external dependencies between samples, which can be useful for accounting for relatedness in tasks such as treatment effect estimation. Graph neural networks only consider adjacent nodes, making them difficult to apply to sparse graphs. This paper proposes several solutions based on a modified attention mechanism, which accounts for possible relationships between data points by adding a term to the attention matrix. Our models are compared with each other and the gradient boosting decision trees in a regression task on synthetic and real-world datasets, as well as in a treatment effect estimation task on the IHDP dataset.
LGFeb 22, 2024
Incorporating Expert Rules into Neural Networks in the Framework of Concept-Based LearningAndrei V. Konstantinov, Lev V. Utkin
A problem of incorporating the expert rules into machine learning models for extending the concept-based learning is formulated in the paper. It is proposed how to combine logical rules and neural networks predicting the concept probabilities. The first idea behind the combination is to form constraints for a joint probability distribution over all combinations of concept values to satisfy the expert rules. The second idea is to represent a feasible set of probability distributions in the form of a convex polytope and to use its vertices or faces. We provide several approaches for solving the stated problem and for training neural networks which guarantee that the output probabilities of concepts would not violate the expert rules. The solution of the problem can be viewed as a way for combining the inductive and deductive learning. Expert rules are used in a broader sense when any logical function that connects concepts and class labels or just concepts with each other can be regarded as a rule. This feature significantly expands the class of the proposed results. Numerical examples illustrate the approaches. The code of proposed algorithms is publicly available.
LGFeb 19, 2024
Generating Survival Interpretable Trajectories and DataAndrei V. Konstantinov, Stanislav R. Kirpichenko, Lev V. Utkin
A new model for generating survival trajectories and data based on applying an autoencoder of a specific structure is proposed. It solves three tasks. First, it provides predictions in the form of the expected event time and the survival function for a new generated feature vector on the basis of the Beran estimator. Second, the model generates additional data based on a given training set that would supplement the original dataset. Third, the most important, it generates a prototype time-dependent trajectory for an object, which characterizes how features of the object could be changed to achieve a different time to an event. The trajectory can be viewed as a type of the counterfactual explanation. The proposed model is robust during training and inference due to a specific weighting scheme incorporating into the variational autoencoder. The model also determines the censored indicators of new generated data by solving a classification task. The paper demonstrates the efficiency and properties of the proposed model using numerical experiments on synthetic and real datasets. The code of the algorithm implementing the proposed model is publicly available.
IVMay 24, 2024
Concept-based Explainable Malignancy Scoring on Pulmonary Nodules in CT ImagesRinat I. Dumaev, Sergei A. Molodyakov, Lev V. Utkin
To increase the transparency of modern computer-aided diagnosis (CAD) systems for assessing the malignancy of lung nodules, an interpretable model based on applying the generalized additive models and the concept-based learning is proposed. The model detects a set of clinically significant attributes in addition to the final malignancy regression score and learns the association between the lung nodule attributes and a final diagnosis decision as well as their contributions into the decision. The proposed concept-based learning framework provides human-readable explanations in terms of different concepts (numerical and categorical), their values, and their contribution to the final prediction. Numerical experiments with the LIDC-IDRI dataset demonstrate that the diagnosis results obtained using the proposed model, which explicitly explores internal relationships, are in line with similar patterns observed in clinical practice. Additionally, the proposed model shows the competitive classification and the nodule attribute scoring performance, highlighting its potential for effective decision-making in the lung nodule diagnosis.
LGDec 10, 2024
SurvBETA: Ensemble-Based Survival Models Using Beran Estimators and Several Attention MechanismsLev V. Utkin, Semen P. Khomets, Vlada A. Efremenko et al.
Many ensemble-based models have been proposed to solve machine learning problems in the survival analysis framework, including random survival forests, the gradient boosting machine with weak survival models, ensembles of the Cox models. To extend the set of models, a new ensemble-based model called SurvBETA (the Survival Beran estimator Ensemble using Three Attention mechanisms) is proposed where the Beran estimator is used as a weak learner in the ensemble. The Beran estimator can be regarded as a kernel regression model taking into account the relationship between instances. Outputs of weak learners in the form of conditional survival functions are aggregated with attention weights taking into account the distance between the analyzed instance and prototypes of all bootstrap samples. The attention mechanism is used three times: for implementation of the Beran estimators, for determining specific prototypes of bootstrap samples and for aggregating the weak model predictions. The proposed model is presented in two forms: in a general form requiring to solve a complex optimization problem for its training; in a simplified form by considering a special representation of the attention weights by means of the imprecise Huber's contamination model which leads to solving a simple optimization problem. Numerical experiments illustrate properties of the model on synthetic data and compare the model with other survival models on real data. A code implementing the proposed model is publicly available.
LGJun 11, 2025
Survival Analysis as Imprecise Classification with Trainable KernelsAndrei V. Konstantinov, Vlada A. Efremenko, Lev V. Utkin
Survival analysis is a fundamental tool for modeling time-to-event data in healthcare, engineering, and finance, where censored observations pose significant challenges. While traditional methods like the Beran estimator offer nonparametric solutions, they often struggle with the complex data structures and heavy censoring. This paper introduces three novel survival models, iSurvM (the imprecise Survival model based on Mean likelihood functions), iSurvQ (the imprecise Survival model based on the Quantiles of likelihood functions), and iSurvJ (the imprecise Survival model based on the Joint learning), that combine imprecise probability theory with attention mechanisms to handle censored data without parametric assumptions. The first idea behind the models is to represent censored observations by interval-valued probability distributions for each instance over time intervals between events moments. The second idea is to employ the kernel-based Nadaraya-Watson regression with trainable attention weights for computing the imprecise probability distribution over time intervals for the entire dataset. The third idea is to consider three decision strategies for training, which correspond to the proposed three models. Experiments on synthetic and real datasets demonstrate that the proposed models, especially iSurvJ, consistently outperform the Beran estimator from the accuracy and computational complexity points of view. Codes implementing the proposed models are publicly available.
LGJun 9, 2025
Ensemble-Based Survival Models with the Self-Attended Beran Estimator PredictionsLev V. Utkin, Semen P. Khomets, Vlada A. Efremenko et al.
Survival analysis predicts the time until an event of interest, such as failure or death, but faces challenges due to censored data, where some events remain unobserved. Ensemble-based models, like random survival forests and gradient boosting, are widely used but can produce unstable predictions due to variations in bootstrap samples. To address this, we propose SurvBESA (Survival Beran Estimators Self-Attended), a novel ensemble model that combines Beran estimators with a self-attention mechanism. Unlike traditional methods, SurvBESA applies self-attention to predicted survival functions, smoothing out noise by adjusting each survival function based on its similarity to neighboring survival functions. We also explore a special case using Huber's contamination model to define attention weights, simplifying training to a quadratic or linear optimization problem. Numerical experiments show that SurvBESA outperforms state-of-the-art models. The implementation of SurvBESA is publicly available.
IVMar 25, 2025
Automated Video-EEG Analysis in Epilepsy Studies: Advances and ChallengesValerii A. Zuev, Elena G. Salmagambetova, Stepan N. Djakov et al.
Epilepsy is typically diagnosed through electroencephalography (EEG) and long-term video-EEG (vEEG) monitoring. The manual analysis of vEEG recordings is time-consuming, necessitating automated tools for seizure detection. Recent advancements in machine learning have shown promise in real-time seizure detection and prediction using EEG and video data. However, diversity of seizure symptoms, markup ambiguities, and limited availability of multimodal datasets hinder progress. This paper reviews the latest developments in automated video-EEG analysis and discusses the integration of multimodal data. We also propose a novel pipeline for treatment effect estimation from vEEG data using concept-based learning, offering a pathway for future research in this domain.
LGMar 22, 2025
A novel gradient-based method for decision trees optimizing arbitrary differential loss functionsAndrei V. Konstantinov, Lev V. Utkin
There are many approaches for training decision trees. This work introduces a novel gradient-based method for constructing decision trees that optimize arbitrary differentiable loss functions, overcoming the limitations of heuristic splitting rules. Unlike traditional approaches that rely on heuristic splitting rules, the proposed method refines predictions using the first and second derivatives of the loss function, enabling the optimization of complex tasks such as classification, regression, and survival analysis. We demonstrate the method's applicability to classification, regression, and survival analysis tasks, including those with censored data. Numerical experiments on both real and synthetic datasets compare the proposed method with traditional decision tree algorithms, such as CART, Extremely Randomized Trees, and SurvTree. The implementation of the method is publicly available, providing a practical tool for researchers and practitioners. This work advances the field of decision tree-based modeling, offering a more flexible and accurate approach for handling structured data and complex tasks. By leveraging gradient-based optimization, the proposed method bridges the gap between traditional decision trees and modern machine learning techniques, paving the way for further innovations in interpretable and high-performing models.
LGFeb 9, 2025
Survival Concept-Based Learning ModelsStanislav R. Kirpichenko, Lev V. Utkin, Andrei V. Konstantinov et al.
Concept-based learning enhances prediction accuracy and interpretability by leveraging high-level, human-understandable concepts. However, existing CBL frameworks do not address survival analysis tasks, which involve predicting event times in the presence of censored data -- a common scenario in fields like medicine and reliability analysis. To bridge this gap, we propose two novel models: SurvCBM (Survival Concept-based Bottleneck Model) and SurvRCM (Survival Regularized Concept-based Model), which integrate concept-based learning with survival analysis to handle censored event time data. The models employ the Cox proportional hazards model and the Beran estimator. SurvCBM is based on the architecture of the well-known concept bottleneck model, offering interpretable predictions through concept-based explanations. SurvRCM uses concepts as regularization to enhance accuracy. Both models are trained end-to-end and provide interpretable predictions in terms of concepts. Two interpretability approaches are proposed: one leveraging the linear relationship in the Cox model and another using an instance-based explanation framework with the Beran estimator. Numerical experiments demonstrate that SurvCBM outperforms SurvRCM and traditional survival models, underscoring the importance and advantages of incorporating concept information. The code for the proposed algorithms is publicly available.
LGJun 28, 2024
FI-CBL: A Probabilistic Method for Concept-Based Learning with Expert RulesLev V. Utkin, Andrei V. Konstantinov, Stanislav R. Kirpichenko
A method for solving concept-based learning (CBL) problem is proposed. The main idea behind the method is to divide each concept-annotated image into patches, to transform the patches into embeddings by using an autoencoder, and to cluster the embeddings assuming that each cluster will mainly contain embeddings of patches with certain concepts. To find concepts of a new image, the method implements the frequentist inference by computing prior and posterior probabilities of concepts based on rates of patches from images with certain values of the concepts. Therefore, the proposed method is called the Frequentist Inference CBL (FI-CBL). FI-CBL allows us to incorporate the expert rules in the form of logic functions into the inference procedure. An idea behind the incorporation is to update prior and conditional probabilities of concepts to satisfy the rules. The method is transparent because it has an explicit sequence of probabilistic calculations and a clear frequency interpretation. Numerical experiments show that FI-CBL outperforms the concept bottleneck model in cases when the number of training data is small. The code of proposed algorithms is publicly available.
LGJan 29, 2024
Dual feature-based and example-based explanation methodsAndrei V. Konstantinov, Boris V. Kozlov, Stanislav R. Kirpichenko et al.
A new approach to the local and global explanation is proposed. It is based on selecting a convex hull constructed for the finite number of points around an explained instance. The convex hull allows us to consider a dual representation of instances in the form of convex combinations of extreme points of a produced polytope. Instead of perturbing new instances in the Euclidean feature space, vectors of convex combination coefficients are uniformly generated from the unit simplex, and they form a new dual dataset. A dual linear surrogate model is trained on the dual dataset. The explanation feature importance values are computed by means of simple matrix calculations. The approach can be regarded as a modification of the well-known model LIME. The dual representation inherently allows us to get the example-based explanation. The neural additive model is also considered as a tool for implementing the example-based explanation approach. Many numerical experiments with real datasets are performed for studying the approach. The code of proposed algorithms is available.
LGDec 11, 2023
SurvBeNIM: The Beran-Based Neural Importance Model for Explaining the Survival ModelsLev V. Utkin, Danila Y. Eremenko, Andrei V. Konstantinov
A new method called the Survival Beran-based Neural Importance Model (SurvBeNIM) is proposed. It aims to explain predictions of machine learning survival models, which are in the form of survival or cumulative hazard functions. The main idea behind SurvBeNIM is to extend the Beran estimator by incorporating the importance functions into its kernels and by implementing these importance functions as a set of neural networks which are jointly trained in an end-to-end manner. Two strategies of using and training the whole neural network implementing SurvBeNIM are proposed. The first one explains a single instance, and the neural network is trained for each explained instance. According to the second strategy, the neural network only learns once on all instances from the dataset and on all generated instances. Then the neural network is used to explain any instance in a dataset domain. Various numerical experiments compare the method with different existing explanation methods. A code implementing the proposed method is publicly available.
LGJan 8, 2022
Attention-based Random Forest and Contamination ModelLev V. Utkin, Andrei V. Konstantinov
A new approach called ABRF (the attention-based random forest) and its modifications for applying the attention mechanism to the random forest (RF) for regression and classification are proposed. The main idea behind the proposed ABRF models is to assign attention weights with trainable parameters to decision trees in a specific way. The weights depend on the distance between an instance, which falls into a corresponding leaf of a tree, and instances, which fall in the same leaf. This idea stems from representation of the Nadaraya-Watson kernel regression in the form of a RF. Three modifications of the general approach are proposed. The first one is based on applying the Huber's contamination model and on computing the attention weights by solving quadratic or linear optimization problems. The second and the third modifications use the gradient-based algorithms for computing trainable parameters. Numerical experiments with various regression and classification datasets illustrate the proposed method.
LGDec 11, 2021
Multi-Attention Multiple Instance LearningAndrei V. Konstantinov, Lev V. Utkin
A new multi-attention based method for solving the MIL problem (MAMIL), which takes into account the neighboring patches or instances of each analyzed patch in a bag, is proposed. In the method, one of the attention modules takes into account adjacent patches or instances, several attention modules are used to get a diverse feature representation of patches, and one attention module is used to unite different feature representations to provide an accurate classification of each patch (instance) and the whole bag. Due to MAMIL, a combined representation of patches and their neighbors in the form of embeddings of a small dimensionality for simple classification is realized. Moreover, different types of patches are efficiently processed, and a diverse feature representation of patches in a bag by using several attention modules is implemented. A simple approach for explaining the classification predictions of patches is proposed. Numerical experiments with various datasets illustrate the proposed method.
LGAug 10, 2021
Attention-like feature explanation for tabular dataAndrei V. Konstantinov, Lev V. Utkin
A new method for local and global explanation of the machine learning black-box model predictions by tabular data is proposed. It is implemented as a system called AFEX (Attention-like Feature EXplanation) and consisting of two main parts. The first part is a set of the one-feature neural subnetworks which aim to get a specific representation for every feature in the form of a basis of shape functions. The subnetworks use shortcut connections with trainable parameters to improve the network performance. The second part of AFEX produces shape functions of features as the weighted sum of the basis shape functions where weights are computed by using an attention-like mechanism. AFEX identifies pairwise interactions between features based on pairwise multiplications of shape functions corresponding to different features. A modification of AFEX with incorporating an additional surrogate model which approximates the black-box model is proposed. AFEX is trained end-to-end on a whole dataset only once such that it does not require to train neural networks again in the explanation stage. Numerical experiments with synthetic and real data illustrate AFEX.
LGJun 16, 2021
An Imprecise SHAP as a Tool for Explaining the Class Probability Distributions under Limited Training DataLev V. Utkin, Andrei V. Konstantinov, Kirill A. Vishniakov
One of the most popular methods of the machine learning prediction explanation is the SHapley Additive exPlanations method (SHAP). An imprecise SHAP as a modification of the original SHAP is proposed for cases when the class probability distributions are imprecise and represented by sets of distributions. The first idea behind the imprecise SHAP is a new approach for computing the marginal contribution of a feature, which fulfils the important efficiency property of Shapley values. The second idea is an attempt to consider a general approach to calculating and reducing interval-valued Shapley values, which is similar to the idea of reachable probability intervals in the imprecise probability theory. A simple special implementation of the general approach in the form of linear optimization problems is proposed, which is based on using the Kolmogorov-Smirnov distance and imprecise contamination models. Numerical examples with synthetic and real data illustrate the imprecise SHAP.
LGApr 18, 2021
SurvNAM: The machine learning survival model explanationLev V. Utkin, Egor D. Satyukov, Andrei V. Konstantinov
A new modification of the Neural Additive Model (NAM) called SurvNAM and its modifications are proposed to explain predictions of the black-box machine learning survival model. The method is based on applying the original NAM to solving the explanation problem in the framework of survival analysis. The basic idea behind SurvNAM is to train the network by means of a specific expected loss function which takes into account peculiarities of the survival model predictions and is based on approximating the black-box model by the extension of the Cox proportional hazards model which uses the well-known Generalized Additive Model (GAM) in place of the simple linear relationship of covariates. The proposed method SurvNAM allows performing the local and global explanation. A set of examples around the explained example is randomly generated for the local explanation. The global explanation uses the whole training dataset. The proposed modifications of SurvNAM are based on using the Lasso-based regularization for functions from GAM and for a special representation of the GAM functions using their weighted linear and non-linear parts, which is implemented as a shortcut connection. A lot of numerical experiments illustrate the SurvNAM efficiency.
LGMar 4, 2021
Ensembles of Random SHAPsLev V. Utkin, Andrei V. Konstantinov
Ensemble-based modifications of the well-known SHapley Additive exPlanations (SHAP) method for the local explanation of a black-box model are proposed. The modifications aim to simplify SHAP which is computationally expensive when there is a large number of features. The main idea behind the proposed modifications is to approximate SHAP by an ensemble of SHAPs with a smaller number of features. According to the first modification, called ER-SHAP, several features are randomly selected many times from the feature set, and Shapley values for the features are computed by means of "small" SHAPs. The explanation results are averaged to get the final Shapley values. According to the second modification, called ERW-SHAP, several points are generated around the explained instance for diversity purposes, and results of their explanation are combined with weights depending on distances between points and the explained instance. The third modification, called ER-SHAP-RF, uses the random forest for preliminary explanation of instances and determining a feature probability distribution which is applied to selection of features in the ensemble-based procedure of ER-SHAP. Many numerical experiments illustrating the proposed modifications demonstrate their efficiency and properties for local explanation.
LGOct 14, 2020
Interpretable Machine Learning with an Ensemble of Gradient Boosting MachinesAndrei V. Konstantinov, Lev V. Utkin
A method for the local and global interpretation of a black-box model on the basis of the well-known generalized additive models is proposed. It can be viewed as an extension or a modification of the algorithm using the neural additive model. The method is based on using an ensemble of gradient boosting machines (GBMs) such that each GBM is learned on a single feature and produces a shape function of the feature. The ensemble is composed as a weighted sum of separate GBMs resulting a weighted sum of shape functions which form the generalized additive model. GBMs are built in parallel using randomized decision trees of depth 1, which provide a very simple architecture. Weights of GBMs as well as features are computed in each iteration of boosting by using the Lasso method and then updated by means of a specific smoothing procedure. In contrast to the neural additive model, the method provides weights of features in the explicit form, and it is simply trained. A lot of numerical experiments with an algorithm implementing the proposed method on synthetic and real datasets demonstrate its efficiency and properties for local and global interpretation.
LGOct 12, 2020
A Generalized Stacking for Implementing Ensembles of Gradient Boosting MachinesAndrei V. Konstantinov, Lev V. Utkin
The gradient boosting machine is one of the powerful tools for solving regression problems. In order to cope with its shortcomings, an approach for constructing ensembles of gradient boosting models is proposed. The main idea behind the approach is to use the stacking algorithm in order to learn a second-level meta-model which can be regarded as a model for implementing various ensembles of gradient boosting models. First, the linear regression of the gradient boosting models is considered as a simplest realization of the meta-model under condition that the linear model is differentiable with respect to its coefficients (weights). Then it is shown that the proposed approach can be simply extended on arbitrary differentiable combination models, for example, on neural networks which are differentiable and can implement arbitrary functions of gradient boosting models. Various numerical examples illustrate the proposed approach.
LGJun 26, 2020
Counterfactual explanation of machine learning survival modelsMaxim S. Kovalev, Lev V. Utkin
A method for counterfactual explanation of machine learning survival models is proposed. One of the difficulties of solving the counterfactual explanation problem is that the classes of examples are implicitly defined through outcomes of a machine learning survival model in the form of survival functions. A condition that establishes the difference between survival functions of the original example and the counterfactual is introduced. This condition is based on using a distance between mean times to event. It is shown that the counterfactual explanation problem can be reduced to a standard convex optimization problem with linear constraints when the explained black-box model is the Cox model. For other black-box models, it is proposed to apply the well-known Particle Swarm Optimization algorithm. A lot of numerical experiments with real and synthetic data demonstrate the proposed method.
LGJun 19, 2020
Gradient boosting machine with partially randomized decision treesAndrei V. Konstantinov, Lev V. Utkin
The gradient boosting machine is a powerful ensemble-based machine learning method for solving regression problems. However, one of the difficulties of its using is a possible discontinuity of the regression function, which arises when regions of training data are not densely covered by training points. In order to overcome this difficulty and to reduce the computational complexity of the gradient boosting machine, we propose to apply the partially randomized trees which can be regarded as a special case of the extremely randomized trees applied to the gradient boosting. The gradient boosting machine with the partially randomized trees is illustrated by means of many numerical examples using synthetic and real data.
LGMay 5, 2020
A robust algorithm for explaining unreliable machine learning survival models using the Kolmogorov-Smirnov boundsMaxim S. Kovalev, Lev V. Utkin
A new robust algorithm based of the explanation method SurvLIME called SurvLIME-KS is proposed for explaining machine learning survival models. The algorithm is developed to ensure robustness to cases of a small amount of training data or outliers of survival data. The first idea behind SurvLIME-KS is to apply the Cox proportional hazards model to approximate the black-box survival model at the local area around a test example due to the linear relationship of covariates in the model. The second idea is to incorporate the well-known Kolmogorov-Smirnov bounds for constructing sets of predicted cumulative hazard functions. As a result, the robust maximin strategy is used, which aims to minimize the average distance between cumulative hazard functions of the explained black-box model and of the approximating Cox model, and to maximize the distance over all cumulative hazard functions in the interval produced by the Kolmogorov-Smirnov bounds. The maximin optimization problem is reduced to the quadratic program. Various numerical experiments with synthetic and real datasets demonstrate the SurvLIME-KS efficiency.
LGMay 5, 2020
SurvLIME-Inf: A simplified modification of SurvLIME for explanation of machine learning survival modelsLev V. Utkin, Maxim S. Kovalev, Ernest M. Kasimov
A new modification of the explanation method SurvLIME called SurvLIME-Inf for explaining machine learning survival models is proposed. The basic idea behind SurvLIME as well as SurvLIME-Inf is to apply the Cox proportional hazards model to approximate the black-box survival model at the local area around a test example. The Cox model is used due to the linear relationship of covariates. In contrast to SurvLIME, the proposed modification uses $L_{\infty }$-norm for defining distances between approximating and approximated cumulative hazard functions. This leads to a simple linear programming problem for determining important features and for explaining the black-box model prediction. Moreover, SurvLIME-Inf outperforms SurvLIME when the training set is very small. Numerical experiments with synthetic and real datasets demonstrate the SurvLIME-Inf efficiency.
LGMar 18, 2020
SurvLIME: A method for explaining machine learning survival modelsMaxim S. Kovalev, Lev V. Utkin, Ernest M. Kasimov
A new method called SurvLIME for explaining machine learning survival models is proposed. It can be viewed as an extension or modification of the well-known method LIME. The main idea behind the proposed method is to apply the Cox proportional hazards model to approximate the survival model at the local area around a test example. The Cox model is used because it considers a linear combination of the example covariates such that coefficients of the covariates can be regarded as quantitative impacts on the prediction. Another idea is to approximate cumulative hazard functions of the explained model and the Cox model by using a set of perturbed points in a local area around the point of interest. The method is reduced to solving an unconstrained convex optimization problem. A lot of numerical experiments demonstrate the SurvLIME efficiency.
LGNov 18, 2019
An explanation method for Siamese neural networksLev V. Utkin, Maxim S. Kovalev, Ernest M. Kasimov
A new method for explaining the Siamese neural network is proposed. It uses the following main ideas. First, the explained feature vector is compared with the prototype of the corresponding class computed at the embedding level (the Siamese neural network output). The important features at this level are determined as features which are close to the same features of the prototype. Second, an autoencoder is trained in a special way in order to take into account the embedding level of the Si-amese network, and its decoder part is used for reconstructing input data with the corresponding changes. Numerical experiments with the well-known dataset MNIST illustrate the propose method.
MLSep 9, 2019
Estimation of Personalized Heterogeneous Treatment Effects Using Concatenation and Augmentation of Feature VectorsLev V. Utkin, Mikhail V. Kots, Viacheslav S. Chukanov
A new meta-algorithm for estimating the conditional average treatment effects is proposed in the paper. The main idea underlying the algorithm is to consider a new dataset consisting of feature vectors produced by means of concatenation of examples from control and treatment groups, which are close to each other. Outcomes of new data are defined as the difference between outcomes of the corresponding examples comprising new feature vectors. The second idea is based on the assumption that the number of controls is rather large and the control outcome function is precisely determined. This assumption allows us to augment treatments by generating feature vectors which are closed to available treatments. The outcome regression function constructed on the augmented set of concatenated feature vectors can be viewed as an estimator of the conditional average treatment effects. A simple modification of the Co-learner based on the random subspace method or the feature bagging is also proposed. Various numerical simulation experiments illustrate the proposed algorithm and show its outperformance in comparison with the well-known T-learner and X-learner for several types of the control and treatment outcome functions.
MLJan 4, 2019
An Adaptive Weighted Deep Forest ClassifierLev V. Utkin, Andrei V. Konstantinov, Viacheslav S. Chukanov et al.
A modification of the confidence screening mechanism based on adaptive weighing of every training instance at each cascade level of the Deep Forest is proposed. The idea underlying the modification is very simple and stems from the confidence screening mechanism idea proposed by Pang et al. to simplify the Deep Forest classifier by means of updating the training set at each level in accordance with the classification accuracy of every training instance. However, if the confidence screening mechanism just removes instances from training and testing processes, then the proposed modification is more flexible and assigns weights by taking into account the classification accuracy. The modification is similar to the AdaBoost to some extent. Numerical experiments illustrate good performance of the proposed modification in comparison with the original Deep Forest proposed by Zhou and Feng.
MLJan 1, 2019
A weighted random survival forestLev V. Utkin, Andrei V. Konstantinov, Viacheslav S. Chukanov et al.
A weighted random survival forest is presented in the paper. It can be regarded as a modification of the random forest improving its performance. The main idea underlying the proposed model is to replace the standard procedure of averaging used for estimation of the random survival forest hazard function by weighted avaraging where the weights are assigned to every tree and can be veiwed as training paremeters which are computed in an optimal way by solving a standard quadratic optimization problem maximizing Harrell's C-index. Numerical examples with real data illustrate the outperformance of the proposed model in comparison with the original random survival forest.
QMAug 5, 2017
A simple genome-wide association study algorithmLev V. Utkin, Irina L. Utkina
A computationally simple genome-wide association study (GWAS) algorithm for estimating the main and epistatic effects of markers or single nucleotide polymorphisms (SNPs) is proposed. It is based on the intuitive assumption that changes of alleles corresponding to important SNPs in a pair of individuals lead to large difference of phenotype values of these individuals. The algorithm is based on considering pairs of individuals instead of SNPs or pairs of SNPs. The main advantage of the algorithm is that it weakly depends on the number of SNPs in a genotype matrix. It mainly depends on the number of individuals, which is typically very small in comparison with the number of SNPs. Numerical experiments with real data sets illustrate the proposed algorithm.
MLMay 25, 2017
Discriminative Metric Learning with Deep ForestLev V. Utkin, Mikhail A. Ryabinin
A Discriminative Deep Forest (DisDF) as a metric learning algorithm is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. The case of the fully supervised learning is studied when the class labels of individual training examples are known. The main idea underlying the algorithm is to assign weights to decision trees in random forest in order to reduce distances between objects from the same class and to increase them between objects from different classes. The weights are training parameters. A specific objective function which combines Euclidean and Manhattan distances and simplifies the optimization problem for training the DisDF is proposed. The numerical experiments illustrate the proposed distance metric algorithm.
MLApr 27, 2017
A Siamese Deep ForestLev V. Utkin, Mikhail A. Ryabinin
A Siamese Deep Forest (SDF) is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. It can be also regarded as an alternative to the well-known Siamese neural networks. The SDF uses a modified training set consisting of concatenated pairs of vectors. Moreover, it defines the class distributions in the deep forest as the weighted sum of the tree class probabilities such that the weights are determined in order to reduce distances between similar pairs and to increase them between dissimilar points. We show that the weights can be obtained by solving a quadratic optimization problem. The SDF aims to prevent overfitting which takes place in neural networks when only limited training data are available. The numerical experiments illustrate the proposed distance metric method.