1.1LGApr 26
Path-Based Gradient Boosting for Graph-Level PredictionClaudio Meggio, Johan Pensar, Riccardo De Bin
We propose PathBoost, a gradient tree boosting method for graph-level classification and regression that learns discriminative path-based features directly from the input graph structure. Building on a previous work, which was tailored to a specific chemistry application, PathBoost introduces three key extensions: (i) adaptation to binary classification through gradient boosting with a logistic loss, (ii) incorporation of multiple node and edge attributes into the path feature space via a prefix-based decomposition, and (iii) automatic anchor node selection based on categorical attribute diversity, eliminating the need for the user to specify the starting point of the considered path features. We compared PathBoost to graph neural networks and graph kernel approaches on several benchmark datasets, obtaining better results in half of them, and comparable results in the rest. PathBoost shows better performances on graphs with larger average node counts. Overall, the results demonstrate that path-based boosting methods can be competitive with more complex black-box approaches.
CYJul 1, 2021
A Decision Support System for Safer Airplane Landings: Predicting Runway Conditions Using XGBoost and Explainable AIAlise Danielle Midtfjord, Riccardo De Bin, Arne Bang Huseby
The presence of snow and ice on runway surfaces reduces the available tire-pavement friction needed for retardation and directional control and causes potential economic and safety threats for the aviation industry during the winter seasons. To activate appropriate safety procedures, pilots need accurate and timely information on the actual runway surface conditions. In this study, XGBoost is used to create a combined runway assessment system, which includes a classification model to identify slippery conditions and a regression model to predict the level of slipperiness. The models are trained on weather data and runway reports. The runway surface conditions are represented by the tire-pavement friction coefficient, which is estimated from flight sensor data from landing aircrafts. The XGBoost models are combined with SHAP approximations to provide a reliable decision support system for airport operators, which can contribute to safer and more economic operations of airport runways. To evaluate the performance of the prediction models, they are compared to several state-of-the-art runway assessment methods. The XGBoost models identify slippery runway conditions with a ROC AUC of 0.95, predict the friction coefficient with a MAE of 0.0254, and outperforms all the previous methods. The results show the strong abilities of machine learning methods to model complex, physical phenomena with a good accuracy. Published version: https://doi.org/10.1016/j.coldregions.2022.103556.
APFeb 16, 2021
Multivariable Fractional Polynomials for lithium-ion batteries degradation models under dynamic conditionsClara Bertinelli Salucci, Azzeddine Bakdi, Ingrid K. Glad et al.
Longevity and safety of lithium-ion batteries are facilitated by efficient monitoring and adjustment of the battery operating conditions. Hence, it is crucial to implement fast and accurate algorithms for State of Health (SoH) monitoring on the Battery Management System. The task is challenging due to the complexity and multitude of the factors contributing to the battery degradation, especially because the different degradation processes occur at various timescales and their interactions play an important role. Data-driven methods bypass this issue by approximating the complex processes with statistical or machine learning models. This paper proposes a data-driven approach which is understudied in the context of battery degradation, despite its simplicity and ease of computation: the Multivariable Fractional Polynomial (MFP) regression. Models are trained from historical data of one exhausted cell and used to predict the SoH of other cells. The data are characterised by varying loads simulating dynamic operating conditions. Two hypothetical scenarios are considered: one assumes that a recent capacity measurement is known, the other is based only on the nominal capacity. It was shown that the degradation behaviour of the batteries under examination is influenced by their historical data, as supported by the low prediction errors achieved (root mean squared errors from 1.2% to 7.22% when considering data up to the battery End of Life). Moreover, we offer a multi-factor perspective where the degree of impact of each different factor is analysed. Finally, we compare with a Long Short-Term Memory Neural Network and other works from the literature on the same dataset. We conclude that the MFP regression is effective and competitive with contemporary works, and provides several additional advantages e.g. in terms of interpretability, generalisability, and implementability.
STOct 30, 2013
A U-statistic estimator for the variance of resampling-based error estimatorsMathias Fuchs, Roman Hornung, Riccardo De Bin et al.
We revisit resampling procedures for error estimation in binary classification in terms of U-statistics. In particular, we exploit the fact that the error rate estimator involving all learning-testing splits is a U-statistic. Thus, it has minimal variance among all unbiased estimators and is asymptotically normally distributed. Moreover, there is an unbiased estimator for this minimal variance if the total sample size is at least the double learning set size plus two. In this case, we exhibit such an estimator which is another U-statistic. It enjoys, again, various optimality properties and yields an asymptotically exact hypothesis test of the equality of error rates when two learning algorithms are compared. Our statements apply to any deterministic learning algorithms under weak non-degeneracy assumptions.