LGMay 23, 2024

Linear Mode Connectivity in Differentiable Tree Ensembles

arXiv:2405.14596v21 citationsh-index: 11ICLR
AI Analysis

This work addresses the challenge of understanding stable optimization in non-convex models for machine learning researchers, though it is incremental as it extends LMC from neural networks to tree ensembles.

The paper tackled the problem of achieving Linear Mode Connectivity (LMC) for soft tree ensembles, a tree-based differentiable model, by incorporating architecture-specific invariances such as subtree flip and splitting order invariances, and demonstrated that LMC can be maintained even without these invariances by using decision list-based tree architectures.

Linear Mode Connectivity (LMC) refers to the phenomenon that performance remains consistent for linearly interpolated models in the parameter space. For independently optimized model pairs from different random initializations, achieving LMC is considered crucial for understanding the stable success of the non-convex optimization in modern machine learning models and for facilitating practical parameter-based operations such as model merging. While LMC has been achieved for neural networks by considering the permutation invariance of neurons in each hidden layer, its attainment for other models remains an open question. In this paper, we first achieve LMC for soft tree ensembles, which are tree-based differentiable models extensively used in practice. We show the necessity of incorporating two invariances: subtree flip invariance and splitting order invariance, which do not exist in neural networks but are inherent to tree architectures, in addition to permutation invariance of trees. Moreover, we demonstrate that it is even possible to exclude such additional invariances while keeping LMC by designing decision list-based tree architectures, where such invariances do not exist by definition. Our findings indicate the significance of accounting for architecture-specific invariances in achieving LMC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes