MLLGMay 20, 2024

A distance for mixed-variable and hierarchical domains with meta variables

arXiv:2405.13073v32 citationsh-index: 27Neurocomputing
Originality Incremental advance
AI Analysis

This addresses the challenge of handling mixed-variable and hierarchical domains in machine learning and optimization, which is incremental as it builds on existing distance-based methods.

The paper tackles the problem of modeling heterogeneous datasets with mixed and hierarchical variables by introducing a framework and a novel distance metric, enabling the use of whole datasets without partitioning and showing improved performance in regression and classification experiments on hyperparameter data.

Heterogeneous datasets emerge in various machine learning and optimization applications that feature different input sources, types or formats. Most models or methods do not natively tackle heterogeneity. Hence, such datasets are often partitioned into smaller and simpler ones, which may limit the generalizability or performance, especially when data is limited. The first main contribution of this work is a modeling framework that generalizes hierarchical, tree-structured, variable-size or conditional search frameworks. The framework models mixed-variable and hierarchical domains in which variables may be continuous, integer, or categorical, with some identified as meta when they influence the structure of the problem. The second main contribution is a novel distance that compares any pair of mixed-variable points that do not share the same variables, allowing to use whole heterogeneous datasets that reside in mixed-variable and hierarchical domains with meta variables. The contributions are illustrated through regression and classification experiments using simple distance-based models applied to datasets of hyperparameters with corresponding performance scores.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes