Towards Meta-learned Algorithm Selection using Implicit Fidelity Information
This work addresses the challenge of efficiently selecting the best algorithm for new datasets, which is incremental as it builds on existing meta-learning and landmarking approaches.
The paper tackles the problem of algorithm selection for machine learning datasets by proposing IMFAS, a method that uses meta-learned learning curves from candidate algorithms to jointly learn dataset topologies and algorithm biases, achieving better performance than Successive Halving with at most 50% of the fidelity sequence during testing.
Automatically selecting the best performing algorithm for a given dataset or ranking multiple algorithms by their expected performance supports users in developing new machine learning applications. Most approaches for this problem rely on pre-computed dataset meta-features and landmarking performances to capture the salient topology of the datasets and those topologies that the algorithms attend to. Landmarking usually exploits cheap algorithms not necessarily in the pool of candidate algorithms to get inexpensive approximations of the topology. While somewhat indicative, hand-crafted dataset meta-features and landmarks are likely insufficient descriptors, strongly depending on the alignment of the topologies that the landmarks and the candidate algorithms search for. We propose IMFAS, a method to exploit multi-fidelity landmarking information directly from the candidate algorithms in the form of non-parametrically non-myopic meta-learned learning curves via LSTMs in a few-shot setting during testing. Using this mechanism, IMFAS jointly learns the topology of the datasets and the inductive biases of the candidate algorithms, without the need to expensively train them to convergence. Our approach produces informative landmarks, easily enriched by arbitrary meta-features at a low computational cost, capable of producing the desired ranking using cheaper fidelities. We additionally show that IMFAS is able to beat Successive Halving with at most 50% of the fidelity sequence during test time.