LGApr 10, 2022
From graphs to DAGs: a low-complexity model and a scalable algorithmShuyu Dong, Michèle Sebag
Learning directed acyclic graphs (DAGs) is long known a critical challenge at the core of probabilistic and causal modeling. The NoTears approach of (Zheng et al., 2018), through a differentiable function involving the matrix exponential trace $\mathrm{tr}(\exp(\cdot))$, opens up a way to learning DAGs via continuous optimization, though with a $O(d^3)$ complexity in the number $d$ of nodes. This paper presents a low-complexity model, called LoRAM for Low-Rank Additive Model, which combines low-rank matrix factorization with a sparsification mechanism for the continuous optimization of DAGs. The main contribution of the approach lies in an efficient gradient approximation method leveraging the low-rank property of the model, and its straightforward application to the computation of projections from graph matrices onto the DAG matrix space. The proposed method achieves a reduction from a cubic complexity to quadratic complexity while handling the same DAG characteristic function as NoTears, and scales easily up to thousands of nodes for the projection problem. The experiments show that the LoRAM achieves efficiency gains of orders of magnitude compared to the state-of-the-art at the expense of a very moderate accuracy loss in the considered range of sparse matrices, and with a low sensitivity to the rank choice of the model's low-rank component.
LGNov 25, 2022
Learning Large Causal Structures from Inverse Covariance Matrix via Sparse Matrix DecompositionShuyu Dong, Kento Uemura, Akito Fujii et al.
Learning causal structures from observational data is a fundamental problem facing important computational challenges when the number of variables is large. In the context of linear structural equation models (SEMs), this paper focuses on learning causal structures from the inverse covariance matrix. The proposed method, called ICID for Independence-preserving Decomposition from Inverse Covariance matrix, is based on continuous optimization of a matrix decomposition model that preserves the nonzero patterns of the inverse covariance matrix. Through theoretical and empirical evidences, we show that ICID efficiently identifies the sought directed acyclic graph (DAG) assuming the knowledge of noise variances. Moreover, ICID is shown empirically to be robust under bounded misspecification of noise variances in the case where the noise variances are non-equal. The proposed method enjoys a low complexity, as reflected by its time efficiency in the experiments, and also enables a novel regularization scheme that yields highly accurate solutions on the Simulated fMRI data (Smith et al., 2011) in comparison with state-of-the-art algorithms.
LGJan 23, 2025
Asymmetrical Latent Representation for Individual Treatment Effect ModelingArmand Lacombe, Michèle Sebag
Conditional Average Treatment Effect (CATE) estimation, at the heart of counterfactual reasoning, is a crucial challenge for causal modeling both theoretically and applicatively, in domains such as healthcare, sociology, or advertising. Borrowing domain adaptation principles, a popular design maps the sample representation to a latent space that balances control and treated populations while enabling the prediction of the potential outcomes. This paper presents a new CATE estimation approach based on the asymmetrical search for two latent spaces called Asymmetrical Latent Representation for Individual Treatment Effect (ALRITE), where the two latent spaces are respectively intended to optimize the counterfactual prediction accuracy on the control and the treated samples. Under moderate assumptions, ALRITE admits an upper bound on the precision of the estimation of heterogeneous effects (PEHE), and the approach is empirically successfully validated compared to the state-of-the-art
LGJun 15, 2024
DCILP: A Distributed Approach for Large-Scale Causal Structure LearningShuyu Dong, Michèle Sebag, Kento Uemura et al.
Causal learning tackles the computationally demanding task of estimating causal graphs. This paper introduces a new divide-and-conquer approach for causal graph learning, called DCILP. In the divide phase, the Markov blanket MB($X_i$) of each variable $X_i$ is identified, and causal learning subproblems associated with each MB($X_i$) are independently addressed in parallel. This approach benefits from a more favorable ratio between the number of data samples and the number of variables considered. In counterpart, it can be adversely affected by the presence of hidden confounders, as variables external to MB($X_i$) might influence those within it. The reconciliation of the local causal graphs generated during the divide phase is a challenging combinatorial optimization problem, especially in large-scale applications. The main novelty of DCILP is an original formulation of this reconciliation as an integer linear programming (ILP) problem, which can be delegated and efficiently handled by an ILP solver. Through experiments on medium to large scale graphs, and comparisons with state-of-the-art methods, DCILP demonstrates significant improvements in terms of computational complexity, while preserving the learning accuracy on real-world problem and suffering at most a slight loss of accuracy on synthetic problems.
LGNov 5, 2021
Frugal Machine LearningMikhail Evchenko, Joaquin Vanschoren, Holger H. Hoos et al.
Machine learning, already at the core of increasingly many systems and applications, is set to become even more ubiquitous with the rapid rise of wearable devices and the Internet of Things. In most machine learning applications, the main focus is on the quality of the results achieved (e.g., prediction accuracy), and hence vast amounts of data are being collected, requiring significant computational resources to build models. In many scenarios, however, it is infeasible or impractical to set up large centralized data repositories. In personal health, for instance, privacy issues may inhibit the sharing of detailed personal data. In such cases, machine learning should ideally be performed on wearable devices themselves, which raises major computational limitations such as the battery capacity of smartwatches. This paper thus investigates frugal learning, aimed to build the most accurate possible models using the least amount of resources. A wide range of learning algorithms is examined through a frugal lens, analyzing their accuracy/runtime performance on a wide range of data sets. The most promising algorithms are thereafter assessed in a real-world scenario by implementing them in a smartwatch and letting them learn activity recognition models on the watch itself.
MLJun 24, 2020
Distribution-Based Invariant Deep Networks for Learning Meta-FeaturesGwendoline De Bie, Herilalaina Rakotoarison, Gabriel Peyré et al.
Recent advances in deep learning from probability distributions successfully achieve classification or regression from distribution samples, thus invariant under permutation of the samples. The first contribution of the paper is to extend these neural architectures to achieve invariance under permutation of the features, too. The proposed architecture, called Dida, inherits the NN properties of universal approximation, and its robustness w.r.t. Lipschitz-bounded transformations of the input distribution is established. The second contribution is to empirically and comparatively demonstrate the merits of the approach on two tasks defined at the dataset level. On both tasks, Dida learns meta-features supporting the characterization of a (labelled) dataset. The first task consists of predicting whether two dataset patches are extracted from the same initial dataset. The second task consists of predicting whether the learning performance achieved by a hyper-parameter configuration under a fixed algorithm (ranging in k-NN, SVM, logistic regression and linear classifier with SGD) dominates that of another configuration, for a dataset extracted from the OpenML benchmarking suite. On both tasks, Dida outperforms the state of the art: DSS (Maron et al., 2020) and Dataset2Vec (Jomaa et al., 2019) architectures, as well as the models based on the hand-crafted meta-features of the literature.
LGMar 4, 2020
Variational Auto-Encoder: not all failures are equalMichele Sebag, Victor Berger, Michèle Sebag
We claim that a source of severe failures for Variational Auto-Encoders is the choice of the distribution class used for the observation model.A first theoretical and experimental contribution of the paper is to establish that even in the large sample limit with arbitrarily powerful neural architectures and latent space, the VAE failsif the sharpness of the distribution class does not match the scale of the data.Our second claim is that the distribution sharpness must preferably be learned by the VAE (as opposed to, fixed and optimized offline): Autonomously adjusting this sharpness allows the VAE to dynamically control the trade-off between the optimization of the reconstruction loss and the latent compression. A second empirical contribution is to show how the control of this trade-off is instrumental in escaping poor local optima, akin a simulated annealing schedule.Both claims are backed upon experiments on artificial data, MNIST and CelebA, showing how sharpness learning addresses the notorious VAE blurriness issue.
LGJan 22, 2020
From abstract items to latent spaces to observed data and back: Compositional Variational Auto-EncoderVictor Berger, Michèle Sebag
Conditional Generative Models are now acknowledged an essential tool in Machine Learning. This paper focuses on their control. While many approaches aim at disentangling the data through the coordinate-wise control of their latent representations, another direction is explored in this paper. The proposed CompVAE handles data with a natural multi-ensemblist structure (i.e. that can naturally be decomposed into elements). Derived from Bayesian variational principles, CompVAE learns a latent representation leveraging both observational and symbolic information. A first contribution of the approach is that this latent representation supports a compositional generative model, amenable to multi-ensemblist operations (addition or subtraction of elements in the composition). This compositional ability is enabled by the invariance and generality of the whole framework w.r.t. respectively, the order and number of the elements. The second contribution of the paper is a proof of concept on synthetic 1D and 2D problems, demonstrating the efficiency of the proposed approach.
LGJun 1, 2019
Automated Machine Learning with Monte-Carlo Tree SearchHerilalaina Rakotoarison, Marc Schoenauer, Michèle Sebag
The AutoML task consists of selecting the proper algorithm in a machine learning portfolio, and its hyperparameter values, in order to deliver the best performance on the dataset at hand. Mosaic, a Monte-Carlo tree search (MCTS) based approach, is presented to handle the AutoML hybrid structural and parametric expensive black-box optimization problem. Extensive empirical studies are conducted to independently assess and compare: i) the optimization processes based on Bayesian optimization or MCTS; ii) its warm-start initialization; iii) the ensembling of the solutions gathered along the search. Mosaic is assessed on the OpenML 100 benchmark and the Scikit-learn portfolio, with statistically significant gains over Auto-Sklearn, winner of former international AutoML challenges.
MLJul 3, 2018
New Losses for Generative Adversarial LearningVictor Berger, Michèle Sebag
Generative Adversarial Networks (Goodfellow et al., 2014), a major breakthrough in the field of generative modeling, learn a discriminator to estimate some distance between the target and the candidate distributions. This paper examines mathematical issues regarding the way the gradients for the generative model are computed in this context, and notably how to take into account how the discriminator itself depends on the generator parameters. A unifying methodology is presented to define mathematically sound training objectives for generative models taking this dependency into account in a robust way, covering both GAN, VAE and some GAN variants as particular cases.
MLMar 13, 2018
Structural Agnostic Modeling: Adversarial Learning of Causal GraphsDiviyan Kalainathan, Olivier Goudet, Isabelle Guyon et al.
A new causal discovery method, Structural Agnostic Modeling (SAM), is presented in this paper. Leveraging both conditional independencies and distributional asymmetries, SAM aims to find the underlying causal structure from observational data. The approach is based on a game between different players estimating each variable distribution conditionally to the others as a neural net, and an adversary aimed at discriminating the generated data against the original data. A learning criterion combining distribution estimation, sparsity and acyclicity constraints is used to enforce the optimization of the graph structure and parameters through stochastic gradient descent. SAM is extensively experimentally validated on synthetic and real data.
MLNov 24, 2017
Causal Generative Neural NetworksOlivier Goudet, Diviyan Kalainathan, Philippe Caillou et al.
We present Causal Generative Neural Networks (CGNNs) to learn functional causal models from observational data. CGNNs leverage conditional independencies and distributional asymmetries to discover bivariate and multivariate causal structures. CGNNs make no assumption regarding the lack of confounders, and learn a differentiable generative model of the data by using backpropagation. Extensive experiments show their good performances comparatively to the state of the art in observational causal discovery on both simulated and real data, with respect to cause-effect inference, v-structure identification, and multivariate causal discovery.
MLSep 15, 2017
Learning Functional Causal Models with Generative Neural NetworksOlivier Goudet, Diviyan Kalainathan, Philippe Caillou et al.
We introduce a new approach to functional causal modeling from observational data, called Causal Generative Neural Networks (CGNN). CGNN leverages the power of neural networks to learn a generative model of the joint distribution of the observed variables, by minimizing the Maximum Mean Discrepancy between generated and observed data. An approximate learning criterion is proposed to scale the computational cost of the approach to linear complexity in the number of observations. The performance of CGNN is studied throughout three experiments. Firstly, CGNN is applied to cause-effect inference, where the task is to identify the best causal hypothesis out of $X\rightarrow Y$ and $Y\rightarrow X$. Secondly, CGNN is applied to the problem of identifying v-structures and conditional independences. Thirdly, CGNN is applied to multivariate functional causal modeling: given a skeleton describing the direct dependences in a set of random variables $\textbf{X} = [X_1, \ldots, X_d]$, CGNN orients the edges in the skeleton to uncover the directed acyclic causal graph describing the causal structure of the random variables. On all three tasks, CGNN is extensively assessed on both artificial and real-world data, comparing favorably to the state-of-the-art. Finally, CGNN is extended to handle the case of confounders, where latent variables are involved in the overall causal model.
MLSep 5, 2017
Stochastic Gradient Descent: Going As Fast As Possible But Not FasterAlice Schoenauer-Sebag, Marc Schoenauer, Michèle Sebag
When applied to training deep neural networks, stochastic gradient descent (SGD) often incurs steady progression phases, interrupted by catastrophic episodes in which loss and gradient norm explode. A possible mitigation of such events is to slow down the learning process. This paper presents a novel approach to control the SGD learning rate, that uses two statistical tests. The first one, aimed at fast learning, compares the momentum of the normalized gradient vectors to that of random unit vectors and accordingly gracefully increases or decreases the learning rate. The second one is a change point detection test, aimed at the detection of catastrophic learning episodes; upon its triggering the learning rate is instantly halved. Both abilities of speeding up and slowing down the learning rate allows the proposed approach, called SALeRA, to learn as fast as possible but not faster. Experiments on standard benchmarks show that SALeRA performs well in practice, and compares favorably to the state of the art.
DSSep 29, 2016
Multi-dimensional signal approximation with sparse structured priors using split Bregman iterationsYoann Isaac, Quentin Barthélemy, Cédric Gouy-Pailler et al.
This paper addresses the structurally-constrained sparse decomposition of multi-dimensional signals onto overcomplete families of vectors, called dictionaries. The contribution of the paper is threefold. Firstly, a generic spatio-temporal regularization term is designed and used together with the standard $\ell_1$ regularization term to enforce a sparse decomposition preserving the spatio-temporal structure of the signal. Secondly, an optimization algorithm based on the split Bregman approach is proposed to handle the associated optimization problem, and its convergence is analyzed. Our well-founded approach yields same accuracy as the other algorithms at the state-of-the-art, with significant gains in terms of convergence speed. Thirdly, the empirical validation of the approach on artificial and real-world problems demonstrates the generality and effectiveness of the method. On artificial problems, the proposed regularization subsumes the Total Variation minimization and recovers the expected decomposition. On the real-world problem of electro-encephalography brainwave decomposition, the approach outperforms similar approaches in terms of P300 evoked potentials detection, using structured spatial priors to guide the decomposition.
NEJun 10, 2014
Maximum Likelihood-based Online Adaptation of Hyper-parameters in CMA-ESIlya Loshchilov, Marc Schoenauer, Michèle Sebag et al.
The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is widely accepted as a robust derivative-free continuous optimization algorithm for non-linear and non-convex optimization problems. CMA-ES is well known to be almost parameterless, meaning that only one hyper-parameter, the population size, is proposed to be tuned by the user. In this paper, we propose a principled approach called self-CMA-ES to achieve the online adaptation of CMA-ES hyper-parameters in order to improve its overall performance. Experimental results show that for larger-than-default population size, the default settings of hyper-parameters of CMA-ES are far from being optimal, and that self-CMA-ES allows for dynamically approaching optimal settings.
LGJan 6, 2014
Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed BanditsNicolas Galichet, Michèle Sebag, Olivier Teytaud
Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the exploration of risky arms, MARAB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MARAB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.
LGAug 12, 2013
KL-based Control of the Learning Schedule for Surrogate Black-Box OptimizationIlya Loshchilov, Marc Schoenauer, Michèle Sebag
This paper investigates the control of an ML component within the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) devoted to black-box optimization. The known CMA-ES weakness is its sample complexity, the number of evaluations of the objective function needed to approximate the global optimum. This weakness is commonly addressed through surrogate optimization, learning an estimate of the objective function a.k.a. surrogate model, and replacing most evaluations of the true objective function with the (inexpensive) evaluation of the surrogate model. This paper presents a principled control of the learning schedule (when to relearn the surrogate model), based on the Kullback-Leibler divergence of the current search distribution and the training distribution of the former surrogate model. The experimental validation of the proposed approach shows significant performance gains on a comprehensive set of ill-conditioned benchmark problems, compared to the best state of the art including the quasi-Newton high-precision BFGS method.
NEApr 10, 2013
Sustainable Cooperative Coevolution with a Multi-Armed BanditFrançois-Michel De Rainville, Michèle Sebag, Christian Gagné et al.
This paper proposes a self-adaptation mechanism to manage the resources allocated to the different species comprising a cooperative coevolutionary algorithm. The proposed approach relies on a dynamic extension to the well-known multi-armed bandit framework. At each iteration, the dynamic multi-armed bandit makes a decision on which species to evolve for a generation, using the history of progress made by the different species to guide the decisions. We show experimentally, on a benchmark and a real-world problem, that evolving the different populations at different paces allows not only to identify solutions more rapidly, but also improves the capacity of cooperative coevolution to solve more complex problems.
DSMar 21, 2013
Multi-dimensional sparse structured signal approximation using split Bregman iterationsYoann Isaac, Quentin Barthélemy, Jamal Atif et al.
The paper focuses on the sparse approximation of signals using overcomplete representations, such that it preserves the (prior) structure of multi-dimensional signals. The underlying optimization problem is tackled using a multi-dimensional split Bregman optimization approach. An extensive empirical evaluation shows how the proposed approach compares to the state of the art depending on the signal features.
LGAug 5, 2012
APRIL: Active Preference-learning based Reinforcement LearningRiad Akrour, Marc Schoenauer, Michèle Sebag
This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent demonstrations. Earlier work has presented an iterative preference-based RL framework: expert preferences are exploited to learn an approximate policy return, thus enabling the agent to achieve direct policy search. Iteratively, the agent selects a new candidate policy and demonstrates it; the expert ranks the new demonstration comparatively to the previous best one; the expert's ranking feedback enables the agent to refine the approximate policy return, and the process is iterated. In this paper, preference-based reinforcement learning is combined with active ranking in order to decrease the number of ranking queries to the expert needed to yield a satisfactory policy. Experiments on the mountain car and the cancer treatment testbeds witness that a couple of dozen rankings enable to learn a competent policy.
AIJul 1, 2012
Alternative Restart Strategies for CMA-ESIlya Loshchilov, Marc Schoenauer, Michèle Sebag
This paper focuses on the restart strategy of CMA-ES on multi-modal functions. A first alternative strategy proceeds by decreasing the initial step-size of the mutation while doubling the population size at each restart. A second strategy adaptively allocates the computational budget among the restart settings in the BIPOP scheme. Both restart strategies are validated on the BBOB benchmark; their generality is also demonstrated on an independent real-world problem suite related to spacecraft trajectory optimization.
NEApr 24, 2012
Black-box optimization benchmarking of IPOP-saACM-ES and BIPOP-saACM-ES on the BBOB-2012 noiseless testbedIlya Loshchilov, Marc Schoenauer, Michèle Sebag
In this paper, we study the performance of IPOP-saACM-ES and BIPOP-saACM-ES, recently proposed self-adaptive surrogate-assisted Covariance Matrix Adaptation Evolution Strategies. Both algorithms were tested using restarts till a total number of function evaluations of $10^6D$ was reached, where $D$ is the dimension of the function search space. We compared surrogate-assisted algorithms with their surrogate-less versions IPOP-saACM-ES and BIPOP-saACM-ES, two algorithms with one of the best overall performance observed during the BBOB-2009 and BBOB-2010. The comparison shows that the surrogate-assisted versions outperform the original CMA-ES algorithms by a factor from 2 to 4 on 8 out of 24 noiseless benchmark problems, showing the best results among all algorithms of the BBOB-2009 and BBOB-2010 on Ellipsoid, Discus, Bent Cigar, Sharp Ridge and Sum of different powers functions.
NEApr 24, 2012
Black-box optimization benchmarking of IPOP-saACM-ES on the BBOB-2012 noisy testbedIlya Loshchilov, Marc Schoenauer, Michèle Sebag
In this paper, we study the performance of IPOP-saACM-ES, recently proposed self-adaptive surrogate-assisted Covariance Matrix Adaptation Evolution Strategy. The algorithm was tested using restarts till a total number of function evaluations of $10^6D$ was reached, where $D$ is the dimension of the function search space. The experiments show that the surrogate model control allows IPOP-saACM-ES to be as robust as the original IPOP-aCMA-ES and outperforms the latter by a factor from 2 to 3 on 6 benchmark problems with moderate noise. On 15 out of 30 benchmark problems in dimension 20, IPOP-saACM-ES exceeds the records observed during BBOB-2009 and BBOB-2010.
NEApr 11, 2012
Self-Adaptive Surrogate-Assisted Covariance Matrix Adaptation Evolution StrategyIlya Loshchilov, Marc Schoenauer, Michèle Sebag
This paper presents a novel mechanism to adapt surrogate-assisted population-based algorithms. This mechanism is applied to ACM-ES, a recently proposed surrogate-assisted variant of CMA-ES. The resulting algorithm, saACM-ES, adjusts online the lifelength of the current surrogate model (the number of CMA-ES generations before learning a new surrogate) and the surrogate hyper-parameters. Both heuristics significantly improve the quality of the surrogate model, yielding a significant speed-up of saACM-ES compared to the ACM-ES and CMA-ES baselines. The empirical validation of saACM-ES on the BBOB-2012 noiseless testbed demonstrates the efficiency and the scalability w.r.t the problem dimension and the population size of the proposed approach, that reaches new best results on some of the benchmark problems.