MLApr 3, 2020
A New Method to Compare the Interpretability of Rule-based AlgorithmsVincent Margot, George Luta
Interpretability is becoming increasingly important for predictive model analysis. Unfortunately, as remarked by many authors, there is still no consensus regarding this notion. The goal of this paper is to propose the definition of a score that allows to quickly compare interpretable algorithms. This definition consists of three terms, each one being quantitatively measured with a simple formula: predictivity, stability and simplicity. While predictivity has been extensively studied to measure the accuracy of predictive algorithms, stability is based on the Dice-Sorensen index for comparing two rule sets generated by an algorithm using two independent samples. The simplicity is based on the sum of the lengths of the rules derived from the predictive model. The proposed score is a weighted sum of the three terms mentioned above. We use this score to compare the interpretability of a set of rule-based algorithms and tree-based algorithms for the regression case and for the classification case.
GNFeb 18, 2020
ESG investments: Filtering versus machine learning approachesCarmine de Franco, Christophe Geissler, Vincent Margot et al.
We designed a machine learning algorithm that identifies patterns between ESG profiles and financial performances for companies in a large investment universe. The algorithm consists of regularly updated sets of rules that map regions into the high-dimensional space of ESG features to excess return predictions. The final aggregated predictions are transformed into scores which allow us to design simple strategies that screen the investment universe for stocks with positive scores. By linking the ESG features with financial performances in a non-linear way, our strategy based upon our machine learning algorithm turns out to be an efficient stock picking tool, which outperforms classic strategies that screen stocks according to their ESG ratings, as the popular best-in-class approach. Our paper brings new ideas in the growing field of financial literature that investigates the links between ESG behavior and the economy. We show indeed that there is clearly some form of alpha in the ESG profile of a company, but that this alpha can be accessed only with powerful, non-linear techniques such as machine learning.
STJul 4, 2019
Consistent Regression using Data-Dependent CoveringsVincent Margot, Jean-Patrick Baudry, Frédéric Guilloux et al.
In this paper, we introduce a novel method to generate interpretable regression function estimators. The idea is based on called data-dependent coverings. The aim is to extract from the data a covering of the feature space instead of a partition. The estimator predicts the empirical conditional expectation over the cells of the partitions generated from the coverings. Thus, such estimator has the same form as those issued from data-dependent partitioning algorithms. We give sufficient conditions to ensure the consistency, avoiding the sufficient condition of shrinkage of the cells that appears in the former literature. Doing so, we reduce the number of covering elements. We show that such coverings are interpretable and each element of the covering is tagged as significant or insignificant. The proof of the consistency is based on a control of the error of the empirical estimation of conditional expectations which is interesting on its own.
MLJul 12, 2018
Rule Induction Partitioning EstimatorVincent Margot, Jean-Patrick Baudry, Frederic Guilloux et al.
RIPE is a novel deterministic and easily understandable prediction algorithm developed for continuous and discrete ordered data. It infers a model, from a sample, to predict and to explain a real variable $Y$ given an input variable $X \in \mathcal X$ (features). The algorithm extracts a sparse set of hyperrectangles $\mathbf r \subset \mathcal X$, which can be thought of as rules of the form If-Then. This set is then turned into a partition of the features space $\mathcal X$ of which each cell is explained as a list of rules with satisfied their If conditions. The process of RIPE is illustrated on simulated datasets and its efficiency compared with that of other usual algorithms.