Ulf Geir Indahl

5.0LGSep 27, 2020Code

RENT -- Repeated Elastic Net Technique for Feature Selection

Anna Jenul, Stefan Schrunner, Kristian Hovde Liland et al.

Feature selection is an essential step in data science pipelines to reduce the complexity associated with large datasets. While much research on this topic focuses on optimizing predictive performance, few studies investigate stability in the context of the feature selection process. In this study, we present the Repeated Elastic Net Technique (RENT) for Feature Selection. RENT uses an ensemble of generalized linear models with elastic net regularization, each trained on distinct subsets of the training data. The feature selection is based on three criteria evaluating the weight distributions of features across all elementary models. This fact leads to the selection of features with high stability that improve the robustness of the final model. Furthermore, unlike established feature selectors, RENT provides valuable information for model interpretation concerning the identification of objects in the data that are difficult to predict during training. In our experiments, we benchmark RENT against six established feature selectors on eight multivariate datasets for binary classification and regression. In the experimental comparison, RENT shows a well-balanced trade-off between predictive performance and stability. Finally, we underline the additional interpretational value of RENT with an exploratory post-hoc analysis of a healthcare dataset.

5.4CVFeb 8, 2019

Addressing Overfitting on Pointcloud Classification using Atrous XCRF

Hasan Asyari Arief, Ulf Geir Indahl, Geir-Harald Strand et al.

Advances in techniques for automated classification of pointcloud data introduce great opportunities for many new and existing applications. However, with a limited number of labeled points, automated classification by a machine learning model is prone to overfitting and poor generalization. The present paper addresses this problem by inducing controlled noise (on a trained model) generated by invoking conditional random field similarity penalties using nearby features. The method is called Atrous XCRF and works by forcing a trained model to respect the similarity penalties provided by unlabeled data. In a benchmark study carried out using the ISPRS 3D labeling dataset, our technique achieves 84.97% in term of overall accuracy, and 71.05% in term of F1 score. The result is on par with the current best model for the benchmark dataset and has the highest value in term of F1 score.

Ulf Geir Indahl

2 Papers