LGDec 4, 2013

Interpreting random forest classification models using a feature contribution method

Anna Palczewska, Jan Palczewski, Richard Marchese Robinson, Daniel Neagu

arXiv:1312.1121v1143 citations

Originality Incremental advance

AI Analysis

This work addresses the need for interpretability in machine learning, specifically for random forest models, which is crucial for users in fields requiring transparent decision-making, though it is incremental as it builds on existing interpretation techniques.

The authors tackled the problem of interpreting 'black box' random forest classification models by developing a feature contribution method that quantifies each variable's influence on individual predictions and identifies class-specific patterns. They demonstrated the method's potential on two UCI benchmark datasets and validated robustness through extensive analysis on many generated models.

Model interpretation is one of the key aspects of the model evaluation process. The explanation of the relationship between model variables and outputs is relatively easy for statistical models, such as linear regressions, thanks to the availability of model parameters and their statistical significance. For "black box" models, such as random forest, this information is hidden inside the model structure. This work presents an approach for computing feature contributions for random forest classification models. It allows for the determination of the influence of each variable on the model prediction for an individual instance. By analysing feature contributions for a training dataset, the most significant variables can be determined and their typical contribution towards predictions made for individual classes, i.e., class-specific feature contribution "patterns", are discovered. These patterns represent a standard behaviour of the model and allow for an additional assessment of the model reliability for a new data. Interpretation of feature contributions for two UCI benchmark datasets shows the potential of the proposed methodology. The robustness of results is demonstrated through an extensive analysis of feature contributions calculated for a large number of generated random forest models.

View on arXiv PDF

Similar