LGAIMar 20

Integrating Meta-Features with Knowledge Graph Embeddings for Meta-Learning

arXiv:2603.198886.0h-index: 9
Predicted impact top 87% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This addresses meta-learning challenges for researchers and practitioners by providing more accurate performance predictions and dataset similarity measures, though it builds incrementally on existing knowledge graph and embedding techniques.

The authors tackled the problem of meta-learning for pipeline performance estimation and dataset similarity estimation by proposing KGmetaSP, a knowledge-graph-embeddings approach that leverages existing experiment data, which improved both tasks on a benchmark of 144,177 OpenML experiments.

The vast collection of machine learning records available on the web presents a significant opportunity for meta-learning, where past experiments are leveraged to improve performance. Two crucial meta-learning tasks are pipeline performance estimation (PPE), which predicts pipeline performance on target datasets, and dataset performance-based similarity estimation (DPSE), which identifies datasets with similar performance patterns. Existing approaches primarily rely on dataset meta-features (e.g., number of instances, class entropy, etc.) to represent datasets numerically and approximate these meta-learning tasks. However, these approaches often overlook the wealth of past experimental results and pipeline metadata available. This limits their ability to capture dataset - pipeline interactions that reveal performance similarity patterns. In this work, we propose KGmetaSP, a knowledge-graph-embeddings approach that leverages existing experiment data to capture these interactions and improve both PPE and DPSE. We represent datasets and pipelines within a unified knowledge graph (KG) and derive embeddings that support pipeline-agnostic meta-models for PPE and distance-based retrieval for DPSE. To validate our approach, we construct a large-scale benchmark comprising 144,177 OpenML experiments, enabling a rich cross-dataset evaluation. KGmetaSP enables accurate PPE using a single pipeline-agnostic meta-model and improves DPSE over baselines. The proposed KGmetaSP, KG, and benchmark are released, establishing a new reference point for meta-learning and demonstrating how consolidating open experiment data into a unified KG advances the field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes