Vân Anh Huynh-Thu

LG
3papers
9citations
Novelty42%
AI Score21

3 Papers

LGJul 5, 2023
Hybrid additive modeling with partial dependence for supervised regression and dynamical systems forecasting

Yann Claes, Vân Anh Huynh-Thu, Pierre Geurts

Learning processes by exploiting restricted domain knowledge is an important task across a plethora of scientific areas, with more and more hybrid training methods additively combining data-driven and model-based approaches. Although the obtained models are more accurate than purely data-driven models, the optimization process usually comes with sensitive regularization constraints. Furthermore, while such hybrid methods have been tested in various scientific applications, they have been mostly tested on dynamical systems, with only limited study about the influence of each model component on global performance and parameter identification. In this work, we introduce a new hybrid training approach based on partial dependence, which removes the need for intricate regularization. Moreover, we assess the performance of hybrid modeling against traditional machine learning methods on standard regression problems. We compare, on both synthetic and real regression problems, several approaches for training such hybrid models. We focus on hybrid methods that additively combine a parametric term with a machine learning term and investigate model-agnostic training procedures. Therefore, experiments are carried out with different types of machine learning models, including tree-based models and artificial neural networks. We also extend our partial dependence optimization process for dynamical systems forecasting and compare it to existing schemes.

LGSep 7, 2021
Optimizing model-agnostic Random Subspace ensembles

Vân Anh Huynh-Thu, Pierre Geurts

This paper presents a model-agnostic ensemble approach for supervised learning. The proposed approach is based on a parametric version of Random Subspace, in which each base model is learned from a feature subset sampled according to a Bernoulli distribution. Parameter optimization is performed using gradient descent and is rendered tractable by using an importance sampling approach that circumvents frequent re-training of the base models after each gradient descent step. The degree of randomization in our parametric Random Subspace is thus automatically tuned through the optimization of the feature selection probabilities. This is an advantage over the standard Random Subspace approach, where the degree of randomization is controlled by a hyper-parameter. Furthermore, the optimized feature selection probabilities can be interpreted as feature importance scores. Our algorithm can also easily incorporate any differentiable regularization term to impose constraints on these importance scores.

MLMay 12, 2016
Context-dependent feature analysis with random forests

Antonio Sutera, Gilles Louppe, Vân Anh Huynh-Thu et al.

In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that reveal to be relevant only in some specific circumstances. In this setting, the contribution of this paper is to extend the random forest variable importances framework in order (i) to identify variables whose relevance is context-dependent and (ii) to characterize as precisely as possible the effect of contextual information on these variables. The usage and the relevance of our framework for highlighting context-dependent variables is illustrated on both artificial and real datasets.