LGFeb 20

Explaining AutoClustering: Uncovering Meta-Feature Contribution in AutoML for Clustering

Matheus Camilo da Silva, Leonardo Arrighi, Ana Carolina Lorena, Sylvio Barbon Junior

arXiv:2602.18348v11.4h-index: 31

Originality Incremental advance

AI Analysis

This work addresses the lack of transparency in AutoML for clustering, which limits reliability and diagnostic insight for researchers and practitioners, though it is incremental as it builds on existing explainability techniques.

The paper tackled the problem of explaining AutoClustering systems by analyzing how dataset meta-features influence algorithm and hyperparameter choices, revealing consistent patterns in meta-feature relevance and identifying weaknesses in current meta-learning strategies.

AutoClustering methods aim to automate unsupervised learning tasks, including algorithm selection (AS), hyperparameter optimization (HPO), and pipeline synthesis (PS), by often leveraging meta-learning over dataset meta-features. While these systems often achieve strong performance, their recommendations are often difficult to justify: the influence of dataset meta-features on algorithm and hyperparameter choices is typically not exposed, limiting reliability, bias diagnostics, and efficient meta-feature engineering. This limits reliability and diagnostic insight for further improvements. In this work, we investigate the explainability of the meta-models in AutoClustering. We first review 22 existing methods and organize their meta-features into a structured taxonomy. We then apply a global explainability technique (i.e., Decision Predicate Graphs) to assess feature importance within meta-models from selected frameworks. Finally, we use local explainability tools such as SHAP (SHapley Additive exPlanations) to analyse specific clustering decisions. Our findings highlight consistent patterns in meta-feature relevance, identify structural weaknesses in current meta-learning strategies that can distort recommendations, and provide actionable guidance for more interpretable Automated Machine Learning (AutoML) design. This study therefore offers a practical foundation for increasing decision transparency in unsupervised learning automation.

View on arXiv PDF

Similar