MESTMLMay 22, 2012

A lasso for hierarchical interactions

arXiv:1205.5050v3514 citations
Originality Incremental advance
AI Analysis

This addresses data collection concerns like cost and effort by focusing on practical sparsity for statisticians and data scientists, though it is incremental as it builds on existing lasso methods.

The paper tackles the problem of including interactions in sparse linear models while ensuring hierarchy, meaning interactions are only included if at least one variable is marginally important, and develops an algorithm with an R package implementation.

We add a set of convex constraints to the lasso to produce sparse interaction models that honor the hierarchy restriction that an interaction only be included in a model if one or both variables are marginally important. We give a precise characterization of the effect of this hierarchy constraint, prove that hierarchy holds with probability one and derive an unbiased estimate for the degrees of freedom of our estimator. A bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint. We distinguish between parameter sparsity - the number of nonzero coefficients - and practical sparsity - the number of raw variables one must measure to make a new prediction. Hierarchy focuses on the latter, which is more closely tied to important data collection concerns such as cost, time and effort. We develop an algorithm, available in the R package hierNet, and perform an empirical study of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes