Johan Larsson

ML
h-index2
6papers
81citations
Novelty51%
AI Score31

6 Papers

LGJun 27, 2022
Benchopt: Reproducible, efficient and collaborative optimization benchmarks

Thomas Moreau, Mathurin Massias, Alexandre Gramfort et al. · berkeley

Numerical validation is at the core of machine learning research as it allows to assess the actual impact of new methods, and to confirm the agreement between theory and practice. Yet, the rapid development of the field poses several challenges: researchers are confronted with a profusion of methods to compare, limited transparency and consensus on best practices, as well as tedious re-implementation work. As a result, validation is often very partial, which can lead to wrong conclusions that slow down the progress of research. We propose Benchopt, a collaborative framework to automate, reproduce and publish optimization benchmarks in machine learning across programming languages and hardware architectures. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments. To demonstrate its broad usability, we showcase benchmarks on three standard learning tasks: $\ell_2$-regularized logistic regression, Lasso, and ResNet18 training for image classification. These benchmarks highlight key practical findings that give a more nuanced view of the state-of-the-art for these problems, showing that for practical evaluation, the devil is in the details. We hope that Benchopt will foster collaborative work in the community hence improving the reproducibility of research findings.

MLJan 7, 2025
The Choice of Normalization Influences Shrinkage in Regularized Regression

Johan Larsson, Jonas Wallin

Regularized models are often sensitive to the scales of the features in the data and it has therefore become standard practice to normalize (center and scale) the features before fitting the model. But there are many different ways to normalize the features and the choice may have dramatic effects on the resulting model. In spite of this, there has so far been no research on this topic. In this paper, we begin to bridge this knowledge gap by studying normalization in the context of lasso, ridge, and elastic net regression. We focus on binary features and show that their class balances (proportions of ones) directly influences the regression coefficients and that this effect depends on the combination of normalization and regularization methods used. We demonstrate that this effect can be mitigated by scaling binary features with their variance in the case of the lasso and standard deviation in the case of ridge regression, but that this comes at the cost of increased variance of the coefficient estimates. For the elastic net, we show that scaling the penalty weights, rather than the features, can achieve the same effect. Finally, we also tackle mixes of binary and normal features as well as interactions and provide some initial results on how to normalize features in these cases.

MLMay 12, 2021
Look-Ahead Screening Rules for the Lasso

Johan Larsson

The lasso is a popular method to induce shrinkage and sparsity in the solution vector (coefficients) of regression problems, particularly when there are many predictors relative to the number of observations. Solving the lasso in this high-dimensional setting can, however, be computationally demanding. Fortunately, this demand can be alleviated via the use of screening rules that discard predictors prior to fitting the model, leading to a reduced problem to be solved. In this paper, we present a new screening strategy: look-ahead screening. Our method uses safe screening rules to find a range of penalty values for which a given predictor cannot enter the model, thereby screening predictors along the remainder of the path. In experiments we show that these look-ahead screening rules outperform the active warm-start version of the Gap Safe rules.

MLApr 27, 2021
The Hessian Screening Rule

Johan Larsson, Jonas Wallin

Predictor screening rules, which discard predictors before fitting a model, have had considerable impact on the speed with which sparse regression problems, such as the lasso, can be solved. In this paper we present a new screening rule for solving the lasso path: the Hessian Screening Rule. The rule uses second-order information from the model to provide both effective screening, particularly in the case of high correlation, as well as accurate warm starts. The proposed rule outperforms all alternatives we study on simulated data sets with both low and high correlation for $\ell_1$-regularized least-squares (the lasso) and logistic regression. It also performs best in general on the real data sets that we examine.

MLMay 7, 2020
The Strong Screening Rule for SLOPE

Johan Larsson, Małgorzata Bogdan, Jonas Wallin

Extracting relevant features from data sets where the number of observations ($n$) is much smaller then the number of predictors ($p$) is a major challenge in modern statistics. Sorted L-One Penalized Estimation (SLOPE), a generalization of the lasso, is a promising method within this setting. Current numerical procedures for SLOPE, however, lack the efficiency that respective tools for the lasso enjoy, particularly in the context of estimating a complete regularization path. A key component in the efficiency of the lasso is predictor screening rules: rules that allow predictors to be discarded before estimating the model. This is the first paper to establish such a rule for SLOPE. We develop a screening rule for SLOPE by examining its subdifferential and show that this rule is a generalization of the strong rule for the lasso. Our rule is heuristic, which means that it may discard predictors erroneously. We present conditions under which this may happen and show that such situations are rare and easily safeguarded against by a simple check of the optimality conditions. Our numerical experiments show that the rule performs well in practice, leading to improvements by orders of magnitude for data in the $p \gg n$ domain, as well as incurring no additional computational overhead when $n \gg p$. We also examine the effect of correlation structures in the design matrix on the rule and discuss algorithmic strategies for employing the rule. Finally, we provide an efficient implementation of the rule in our R package SLOPE.

NAJul 16, 2015
Exploiting Active Subspaces to Quantify Uncertainty in the Numerical Simulation of the HyShot II Scramjet

Paul Constantine, Michael Emory, Johan Larsson et al.

We present a computational analysis of the reactive flow in a hypersonic scramjet engine with focus on effects of uncertainties in the operating conditions. We employ a novel methodology based on active subspaces to characterize the effects of the input uncertainty on the scramjet performance. The active subspace identifies one-dimensional structure in the map from simulation inputs to quantity of interest that allows us to reparameterize the operating conditions; instead of seven physical parameters, we can use a single derived active variable. This dimension reduction enables otherwise infeasible uncertainty quantification, considering the simulation cost of roughly 9500 CPU-hours per run. For two values of the fuel injection rate, we use a total of 68 simulations to (i) identify the parameters that contribute the most to the variation in the output quantity of interest, (ii) estimate upper and lower bounds on the quantity of interest, (iii) classify sets of operating conditions as safe or unsafe corresponding to a threshold on the output quantity of interest, and (iv) estimate a cumulative distribution function for the quantity of interest.