Joaquin Vanschoren

LG
h-index59
74papers
5,267citations
Novelty36%
AI Score56

74 Papers

LGJul 20, 2022Code
DataPerf: Benchmarks for Data-Centric AI Development

Mark Mazumder, Colby Banbury, Xiaozhe Yao et al.

Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry.

LGJul 25, 2022Code
AMLB: an AutoML Benchmark

Pieter Gijsbers, Marcos L. P. Bueno, Stefan Coors et al.

Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explored with a multi-faceted analysis, evaluating model accuracy, its trade-offs with inference time, and framework failures. We also use Bradley-Terry trees to discover subsets of tasks where the relative AutoML framework rankings differ. The benchmark comes with an open-source tool that integrates with many AutoML frameworks and automates the empirical evaluation process end-to-end: from framework installation and resource allocation to in-depth evaluation. The benchmark uses public data sets, can be easily extended with other AutoML frameworks and tasks, and has a website with up-to-date results.

LGNov 21, 2023
DMLR: Data-centric Machine Learning Research -- Past, Present and Future

Luis Oala, Manil Maskey, Lilith Bat-Leah et al. · mit

Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.

CVAug 18, 2022Code
Evaluating Continual Test-Time Adaptation for Contextual and Semantic Domain Shifts

Tommie Kerssies, Mert Kılıçkaya, Joaquin Vanschoren

In this paper, our goal is to adapt a pre-trained convolutional neural network to domain shifts at test time. We do so continually with the incoming stream of test batches, without labels. The existing literature mostly operates on artificial shifts obtained via adversarial perturbations of a test image. Motivated by this, we evaluate the state of the art on two realistic and challenging sources of domain shifts, namely contextual and semantic shifts. Contextual shifts correspond to the environment types, for example, a model pre-trained on indoor context has to adapt to the outdoor context on CORe-50. Semantic shifts correspond to the capture types, for example a model pre-trained on natural images has to adapt to cliparts, sketches, and paintings on DomainNet. We include in our analysis recent techniques such as Prediction-Time Batch Normalization (BN), Test Entropy Minimization (TENT) and Continual Test-Time Adaptation (CoTTA). Our findings are three-fold: i) Test-time adaptation methods perform better and forget less on contextual shifts compared to semantic shifts, ii) TENT outperforms other methods on short-term adaptation, whereas CoTTA outpeforms other methods on long-term adaptation, iii) BN is most reliable and robust. Our code is available at https://github.com/tommiekerssies/Evaluating-Continual-Test-Time-Adaptation-for-Contextual-and-Semantic-Domain-Shifts.

LGMar 23, 2023Code
AdaCL:Adaptive Continual Learning

Elif Ceren Gok Yildirim, Murat Onur Yildirim, Mert Kilickaya et al.

Class-Incremental Learning aims to update a deep classifier to learn new categories while maintaining or improving its accuracy on previously observed classes. Common methods to prevent forgetting previously learned classes include regularizing the neural network updates and storing exemplars in memory, which come with hyperparameters such as the learning rate, regularization strength, or the number of exemplars. However, these hyperparameters are usually only tuned at the start and then kept fixed throughout the learning sessions, ignoring the fact that newly encountered tasks may have varying levels of novelty or difficulty. This study investigates the necessity of hyperparameter `adaptivity' in Class-Incremental Learning: the ability to dynamically adjust hyperparameters such as the learning rate, regularization strength, and memory size according to the properties of the new task at hand. We propose AdaCL, a Bayesian Optimization-based approach to automatically and efficiently determine the optimal values for those parameters with each learning task. We show that adapting hyperpararmeters on each new task leads to improvement in accuracy, forgetting and memory. Code is available at https://github.com/ElifCerenGokYildirim/AdaCL.

LGJul 10, 2023
Advances and Challenges in Meta-Learning: A Technical Review

Anna Vettoruzzo, Mohamed-Rafik Bouguelia, Joaquin Vanschoren et al.

Meta-learning empowers learning systems with the ability to acquire knowledge from multiple tasks, enabling faster adaptation and generalization to new tasks. This review provides a comprehensive technical overview of meta-learning, emphasizing its importance in real-world applications where data may be scarce or expensive to obtain. The paper covers the state-of-the-art meta-learning approaches and explores the relationship between meta-learning and multi-task learning, transfer learning, domain adaptation and generalization, self-supervised learning, personalized federated learning, and continual learning. By highlighting the synergies between these topics and the field of meta-learning, the paper demonstrates how advancements in one area can benefit the field as a whole, while avoiding unnecessary duplication of efforts. Additionally, the paper delves into advanced meta-learning topics such as learning from complex multi-modal task distributions, unsupervised meta-learning, learning to efficiently adapt to data distribution shifts, and continual meta-learning. Lastly, the paper highlights open problems and challenges for future research in the field. By synthesizing the latest research developments, this paper provides a thorough understanding of meta-learning and its potential impact on various machine learning applications. We believe that this technical overview will contribute to the advancement of meta-learning and its practical implications in addressing real-world problems.

CVFeb 16, 2023
Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification

Ihsan Ullah, Dustin Carrión-Ojeda, Sergio Escalera et al.

We introduce Meta-Album, an image classification meta-dataset designed to facilitate few-shot learning, transfer learning, meta-learning, among other tasks. It includes 40 open datasets, each having at least 20 classes with 40 examples per class, with verified licences. They stem from diverse domains, such as ecology (fauna and flora), manufacturing (textures, vehicles), human actions, and optical character recognition, featuring various image scales (microscopic, human scales, remote sensing). All datasets are preprocessed, annotated, and formatted uniformly, and come in 3 versions (Micro $\subset$ Mini $\subset$ Extended) to match users' computational resources. We showcase the utility of the first 30 datasets on few-shot learning problems. The other 10 will be released shortly after. Meta-Album is already more diverse and larger (in number of datasets) than similar efforts, and we are committed to keep enlarging it via a series of competitions. As competitions terminate, their test data are released, thus creating a rolling benchmark, available through OpenML.org. Our website https://meta-album.github.io/ contains the source code of challenge winning methods, baseline methods, data loaders, and instructions for contributing either new datasets or algorithms to our expandable meta-dataset.

AIMar 15, 2023
Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML

Hilde Weerts, Florian Pfisterer, Matthias Feurer et al.

The field of automated machine learning (AutoML) introduces techniques that automate parts of the development of machine learning (ML) systems, accelerating the process and reducing barriers for novices. However, decisions derived from ML models can reproduce, amplify, or even introduce unfairness in our societies, causing harm to (groups of) individuals. In response, researchers have started to propose AutoML systems that jointly optimize fairness and predictive performance to mitigate fairness-related harm. However, fairness is a complex and inherently interdisciplinary subject, and solely posing it as an optimization problem can have adverse side effects. With this work, we aim to raise awareness among developers of AutoML systems about such limitations of fairness-aware AutoML, while also calling attention to the potential of AutoML as a tool for fairness research. We present a comprehensive overview of different ways in which fairness-related harm can arise and the ensuing implications for the design of fairness-aware AutoML. We conclude that while fairness cannot be automated, fairness-aware AutoML can play an important role in the toolbox of ML practitioners. We highlight several open technical challenges for future work in this direction. Additionally, we advocate for the creation of more user-centered assistive systems designed to tackle challenges encountered in fairness work

59.7CLMay 31
TimeSage-MT: A Multi-Turn Benchmark for Evaluating Agentic Time Series Reasoning

Yaxuan Kong, Qingren Yao, Yuqi Nie et al.

Time series data inform critical decisions across many real-world domains. While large language model (LLM) agents can analyze data through natural language and tools, it remains unclear whether they can conduct reliable time series analysis across multi-turn conversations. Existing benchmarks focus on single-step tasks such as forecasting and anomaly detection, overlooking practical workflows where user goals evolve, agents must build on prior analyses, and conclusions emerge from accumulated evidence. In this work, we introduce TimeSage-MT, a multi-turn benchmark for agentic time series reasoning with 240 tasks and 2,680 dialogue turns across 8 real-world domains, spanning basic exploration to decision-oriented analysis. TimeSage-MT is built through a reproducible pipeline that converts real-world time series data into multi-turn conversations with verifiable answers. It provides a unified evaluation protocol and public leaderboard for comparing time series agentic systems. To demonstrate the benchmark's utility, we evaluate frontier LLMs alongside TimeSage, a novel structured agent equipped with a comprehensive time series skill library. The results show sharp performance drops on decision-oriented tasks, driven by failures in memory, uncertainty handling, and domain-based decision making. TimeSage-MT exposes critical gaps in current agentic reasoning and provides a rigorous foundation for future development.

LGAug 28, 2023
Continual Learning with Dynamic Sparse Training: Exploring Algorithms for Effective Model Updates

Murat Onur Yildirim, Elif Ceren Gok Yildirim, Ghada Sokar et al.

Continual learning (CL) refers to the ability of an intelligent system to sequentially acquire and retain knowledge from a stream of data with as little computational overhead as possible. To this end; regularization, replay, architecture, and parameter isolation approaches were introduced to the literature. Parameter isolation using a sparse network which enables to allocate distinct parts of the neural network to different tasks and also allows to share of parameters between tasks if they are similar. Dynamic Sparse Training (DST) is a prominent way to find these sparse networks and isolate them for each task. This paper is the first empirical study investigating the effect of different DST components under the CL paradigm to fill a critical research gap and shed light on the optimal configuration of DST for CL if it exists. Therefore, we perform a comprehensive study in which we investigate various DST components to find the best topology per task on well-known CIFAR100 and miniImageNet benchmarks in a task-incremental CL setup since our primary focus is to evaluate the performance of various DST criteria, rather than the process of mask selection. We found that, at a low sparsity level, Erdos-Rényi Kernel (ERK) initialization utilizes the backbone more efficiently and allows to effectively learn increments of tasks. At a high sparsity level, unless it is extreme, uniform initialization demonstrates a more reliable and robust performance. In terms of growth strategy; performance is dependent on the defined initialization strategy and the extent of sparsity. Finally, adaptivity within DST components is a promising way for better continual learners.

LGNov 1, 2022
Meta-Learning for Unsupervised Outlier Detection with Optimal Transport

Prabhant Singh, Joaquin Vanschoren

Automated machine learning has been widely researched and adopted in the field of supervised classification and regression, but progress in unsupervised settings has been limited. We propose a novel approach to automate outlier detection based on meta-learning from previous datasets with outliers. Our premise is that the selection of the optimal outlier detection technique depends on the inherent properties of the data distribution. We leverage optimal transport in particular, to find the dataset with the most similar underlying distribution, and then apply the outlier detection techniques that proved to work best for that data distribution. We evaluate the robustness of our approach and find that it outperforms the state of the art methods in unsupervised outlier detection. This approach can also be easily generalized to automate other unsupervised settings.

LGNov 1, 2022
Automated Imbalanced Learning

Prabhant Singh, Joaquin Vanschoren

Automated Machine Learning has grown very successful in automating the time-consuming, iterative tasks of machine learning model development. However, current methods struggle when the data is imbalanced. Since many real-world datasets are naturally imbalanced, and improper handling of this issue can lead to quite useless models, this issue should be handled carefully. This paper first introduces a new benchmark to study how different AutoML methods are affected by label imbalance. Second, we propose strategies to better deal with imbalance and integrate them into an existing AutoML framework. Finally, we present a systematic study which evaluates the impact of these strategies and find that their inclusion in AutoML systems significantly increases their robustness against label imbalance.

67.7AIMay 28
Croissant Tasks: A Metadata Format for Reproducible Machine Learning Evaluations

Omar Benjelloun, Leonardo Martins Bianco, Isabelle Guyon et al.

Reproducibility is fundamental to the scientific method, yet remains a critical challenge in machine learning. Contributing factors include underspecified execution details and brittle software environments. Human-centric remedies, such as checklists and manual verification, help but require intensive effort and fail to scale. To address this, we introduce Croissant Tasks: a declarative, machine-actionable metadata format that abstracts low-level implementation details into high-level specifications. This format enables conceptual reproducibility: verifying claims via independent, agent-generated implementations rather than brittle source code replication. We contribute: (1) the Croissant Tasks specification, formally decoupling task problem from solution; (2) an automated LLM pipeline that retrofits existing benchmarks into this format; and (3) empirical validation showing autonomous agents can ingest these specifications to generate functional, accurate reproduction pipelines from scratch. We envision this format as a new foundation for automated and conceptual reproducibility in machine learning.

CVJan 26, 2023
Are Labels Needed for Incremental Instance Learning?

Mert Kilickaya, Joaquin Vanschoren

In this paper, we learn to classify visual object instances, incrementally and via self-supervision (self-incremental). Our learner observes a single instance at a time, which is then discarded from the dataset. Incremental instance learning is challenging, since longer learning sessions exacerbate forgetfulness, and labeling instances is cumbersome. We overcome these challenges via three contributions: i. We propose VINIL, a self-incremental learner that can learn object instances sequentially, ii. We equip VINIL with self-supervision to by-pass the need for instance labelling, iii. We compare VINIL to label-supervised variants on two large-scale benchmarks, and show that VINIL significantly improves accuracy while reducing forgetfulness.

CVApr 18, 2023Code
Neural Architecture Search for Visual Anomaly Segmentation

Tommie Kerssies, Joaquin Vanschoren

This paper presents the first application of neural architecture search to the complex task of segmenting visual anomalies. Measurement of anomaly segmentation performance is challenging due to imbalanced anomaly pixels, varying region areas, and various types of anomalies. First, the region-weighted Average Precision (rwAP) metric is proposed as an alternative to existing metrics, which does not need to be limited to a specific maximum false positive rate. Second, the AutoPatch neural architecture search method is proposed, which enables efficient segmentation of visual anomalies without any training. By leveraging a pre-trained supernet, a black-box optimization algorithm can directly minimize computational complexity and maximize performance on a small validation set of anomalous examples. Finally, compelling results are presented on the widely studied MVTec dataset, demonstrating that AutoPatch outperforms the current state-of-the-art with lower computational complexity, using only one example per type of anomaly. The results highlight the potential of automated machine learning to optimize throughput in industrial quality control. The code for AutoPatch is available at: https://github.com/tommiekerssies/AutoPatch

LGNov 20, 2023
What Can AutoML Do For Continual Learning?

Mert Kilickaya, Joaquin Vanschoren

This position paper outlines the potential of AutoML for incremental (continual) learning to encourage more research in this direction. Incremental learning involves incorporating new data from a stream of tasks and distributions to learn enhanced deep representations and adapt better to new tasks. However, a significant limitation of incremental learners is that most current techniques freeze the backbone architecture, hyperparameters, and the order & structure of the learning tasks throughout the learning and adaptation process. We strongly believe that AutoML offers promising solutions to address these limitations, enabling incremental learning to adapt to more diverse real-world tasks. Therefore, instead of directly proposing a new method, this paper takes a step back by posing the question: "What can AutoML do for incremental learning?" We outline three key areas of research that can contribute to making incremental learners more dynamic, highlighting concrete opportunities to apply AutoML methods in novel ways as well as entirely new challenges for AutoML research.

LGJul 7, 2023
MALIBO: Meta-learning for Likelihood-free Bayesian Optimization

Jiarong Pan, Stefan Falkner, Felix Berkenkamp et al.

Bayesian optimization (BO) is a popular method to optimize costly black-box functions. While traditional BO optimizes each new target task from scratch, meta-learning has emerged as a way to leverage knowledge from related tasks to optimize new tasks faster. However, existing meta-learning BO methods rely on surrogate models that suffer from scalability issues and are sensitive to observations with different scales and noise types across tasks. Moreover, they often overlook the uncertainty associated with task similarity. This leads to unreliable task adaptation when only limited observations are obtained or when the new tasks differ significantly from the related tasks. To address these limitations, we propose a novel meta-learning BO approach that bypasses the surrogate model and directly learns the utility of queries across tasks. Our method explicitly models task uncertainty and includes an auxiliary model to enable robust adaptation to new tasks. Extensive experiments show that our method demonstrates strong anytime performance and outperforms state-of-the-art meta-learning BO methods in various benchmarks.

ROJun 14, 2022
Open-Ended Learning Strategies for Learning Complex Locomotion Skills

Fangqin Zhou, Joaquin Vanschoren

Teaching robots to learn diverse locomotion skills under complex three-dimensional environmental settings via Reinforcement Learning (RL) is still challenging. It has been shown that training agents in simple settings before moving them on to complex settings improves the training process, but so far only in the context of relatively simple locomotion skills. In this work, we adapt the Enhanced Paired Open-Ended Trailblazer (ePOET) approach to train more complex agents to walk efficiently on complex three-dimensional terrains. First, to generate more rugged and diverse three-dimensional training terrains with increasing complexity, we extend the Compositional Pattern Producing Networks - Neuroevolution of Augmenting Topologies (CPPN-NEAT) approach and include randomized shapes. Second, we combine ePOET with Soft Actor-Critic off-policy optimization, yielding ePOET-SAC, to ensure that the agent could learn more diverse skills to solve more challenging tasks. Our experimental results show that the newly generated three-dimensional terrains have sufficient diversity and complexity to guide learning, that ePOET successfully learns complex locomotion skills on these terrains, and that our proposed ePOET-SAC approach slightly improves upon ePOET.

CLJan 10, 2024Code
TrustLLM: Trustworthiness in Large Language Models

Yue Huang, Lichao Sun, Haoran Wang et al.

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.

61.5LGMay 14Code
Croissant Baker: Metadata Generation for Discoverable, Governable, and Reusable ML Datasets

Rafi Al Attrach, Rajna Fani, Sebastian Lobentanzer et al.

Croissant has emerged as the metadata standard for machine learning datasets, providing a structured, JSON-LD-based format that makes dataset discovery, automated ingestion, and reproducible analysis machine-checkable across ML platforms. Adoption has accelerated, and NeurIPS now requires Croissant metadata in every submission to its dataset tracks. Yet in practice Croissant generation usually starts with uploading data to a public platform, a path infeasible for governed and large local repositories that hold much of the high-value data ML increasingly relies on. We release Croissant Baker, a local-first, open-source command-line tool that generates validated Croissant metadata directly from a dataset directory through a modular handler registry. We evaluate Croissant Baker on over 140 datasets, scaling to MIMIC-IV at 886 million rows and 374 Parquet files. On held-out comparisons against producer-authored or standards-derived ground truth, Croissant Baker reaches 97-100% agreement across multiple domains.

LGMay 12, 2022
Warm-starting DARTS using meta-learning

Matej Grobelnik, Joaquin Vanschoren

Neural architecture search (NAS) has shown great promise in the field of automated machine learning (AutoML). NAS has outperformed hand-designed networks and made a significant step forward in the field of automating the design of deep neural networks, thus further reducing the need for human expertise. However, most research is done targeting a single specific task, leaving research of NAS methods over multiple tasks mostly overlooked. Generally, there exist two popular ways to find an architecture for some novel task. Either searching from scratch, which is ineffective by design, or transferring discovered architectures from other tasks, which provides no performance guarantees and is probably not optimal. In this work, we present a meta-learning framework to warm-start Differentiable architecture search (DARTS). DARTS is a NAS method that can be initialized with a transferred architecture and is able to quickly adapt to new tasks. A task similarity measure is used to determine which transfer architecture is selected, as transfer architectures found on similar tasks will likely perform better. Additionally, we employ a simple meta-transfer architecture that was learned over multiple tasks. Experiments show that warm-started DARTS is able to find competitive performing architectures while reducing searching costs on average by 60%.

75.4LGMar 22Code
Pruned Adaptation Modules: A Simple yet Strong Baseline for Continual Foundation Models

Elif Ceren Gok Yildirim, Murat Onur Yildirim, Joaquin Vanschoren

The continual learning literature has rapidly shifted from traditional class incremental learning (CIL) techniques to foundation model (FM)-based CIL methods without a clear understanding of how these newer approaches compare to strong, lightweight convolutional baselines. This abrupt transition has created a substantial methodological gap, making it difficult to assess whether recent FM-based CIL progress reflects genuine advances or merely the absence of rigorous baselines. To address this gap, we introduce Pruned Adaptation Modules (PAM), a simple yet effective method that freezes the vast majority of the pre-trained ResNet while enabling scalable continual adaptation through sparse task-specific layers. PAM yields up to a ~5x reduction in trainable parameters and a ~6x reduction in total parameters, significantly reducing the cost of continual updates. Across diverse benchmarks, PAM consistently mitigates catastrophic forgetting and outperforms state-of-the-art FM-based CIL approaches. Our findings position PAM as a strong and transparent baseline that helps bridge the gap between traditional and FM-based CIL, guiding future research for a more accurate assessment of true progress in continual adaptation. The code can be found at: https://github.com/ElifCerenGokYildirim/PAM.

LGAug 6, 2024
Learning to Learn without Forgetting using Attention

Anna Vettoruzzo, Joaquin Vanschoren, Mohamed-Rafik Bouguelia et al.

Continual learning (CL) refers to the ability to continually learn over time by accommodating new knowledge while retaining previously learned experience. While this concept is inherent in human learning, current machine learning methods are highly prone to overwrite previously learned patterns and thus forget past experience. Instead, model parameters should be updated selectively and carefully, avoiding unnecessary forgetting while optimally leveraging previously learned patterns to accelerate future learning. Since hand-crafting effective update mechanisms is difficult, we propose meta-learning a transformer-based optimizer to enhance CL. This meta-learned optimizer uses attention to learn the complex relationships between model parameters across a stream of tasks, and is designed to generate effective weight updates for the current task while preventing catastrophic forgetting on previously encountered tasks. Evaluations on benchmark datasets like SplitMNIST, RotatedMNIST, and SplitCIFAR-100 affirm the efficacy of the proposed approach in terms of both forward and backward transfer, even on small sets of labeled data, highlighting the advantages of integrating a meta-learned optimizer within the continual learning framework.

CVJul 23, 2024
HyTAS: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis

Fangqin Zhou, Mert Kilickaya, Joaquin Vanschoren et al.

Hyperspectral Imaging (HSI) plays an increasingly critical role in precise vision tasks within remote sensing, capturing a wide spectrum of visual data. Transformer architectures have significantly enhanced HSI task performance, while advancements in Transformer Architecture Search (TAS) have improved model discovery. To harness these advancements for HSI classification, we make the following contributions: i) We propose HyTAS, the first benchmark on transformer architecture search for Hyperspectral imaging, ii) We comprehensively evaluate 12 different methods to identify the optimal transformer over 5 different datasets, iii) We perform an extensive factor analysis on the Hyperspectral transformer search performance, greatly motivating future research in this direction. All benchmark materials are available at HyTAS.

CVDec 28, 2024Code
On dataset transferability in medical image classification

Dovile Juodelyte, Enzo Ferrante, Yucheng Lu et al.

Current transferability estimation methods designed for natural image datasets are often suboptimal in medical image classification. These methods primarily focus on estimating the suitability of pre-trained source model features for a target dataset, which can lead to unrealistic predictions, such as suggesting that the target dataset is the best source for itself. To address this, we propose a novel transferability metric that combines feature quality with gradients to evaluate both the suitability and adaptability of source model features for target tasks. We evaluate our approach in two new scenarios: source dataset transferability for medical image classification and cross-domain transferability. Our results show that our method outperforms existing transferability metrics in both settings. We also provide insight into the factors influencing transfer performance in medical image classification, as well as the dynamics of cross-domain transfer from natural to medical images. Additionally, we provide ground-truth transfer performance benchmarking results to encourage further research into transferability estimation for medical image classification. Our code and experiments are available at https://github.com/DovileDo/transferability-in-medical-imaging.

LGJul 26, 2024
On Supernet Transfer Learning for Effective Task Adaptation

Prabhant Singh, Joaquin Vanschoren

Neural Architecture Search (NAS) methods have been shown to outperform hand-designed models and help to democratize AI. However, NAS methods often start from scratch with each new task, making them computationally expensive and limiting their applicability. Transfer learning is a practical alternative with the rise of ever-larger pretrained models. However, it is also bound to the architecture of the pretrained model, which inhibits proper adaptation of the architecture to different tasks, leading to suboptimal (and excessively large) models. We address both challenges at once by introducing a novel and practical method to \textit{transfer supernets}, which parameterize both weight and architecture priors, and efficiently finetune both to new tasks. This enables supernet transfer learning as a replacement for traditional transfer learning that also finetunes model architectures to new tasks. Through extensive experiments across multiple image classification tasks, we demonstrate that supernet transfer learning does not only drastically speed up the discovery of optimal models (3 to 5 times faster on average), but will also find better models than running NAS from scratch. The added model flexibility also increases the robustness of transfer learning, yielding positive transfer to even very different target datasets, especially with multi-dataset pretraining.

LGJul 15, 2024
CLAMS: A System for Zero-Shot Model Selection for Clustering

Prabhant Singh, Pieter Gijsbers, Murat Onur Yildirim et al.

We propose an AutoML system that enables model selection on clustering problems by leveraging optimal transport-based dataset similarity. Our objective is to establish a comprehensive AutoML pipeline for clustering problems and provide recommendations for selecting the most suitable algorithms, thus opening up a new area of AutoML beyond the traditional supervised learning settings. We compare our results against multiple clustering baselines and find that it outperforms all of them, hence demonstrating the utility of similarity-based automated model selection for solving clustering applications.

CVMar 13, 2024Code
Self-Regulated Neurogenesis for Online Data-Incremental Learning

Murat Onur Yildirim, Elif Ceren Gok Yildirim, Decebal Constantin Mocanu et al.

Neural networks often struggle with catastrophic forgetting when learning sequences of tasks or data streams, unlike humans who can continuously learn and consolidate new concepts even in the absence of explicit cues. Online data-incremental learning seeks to emulate this capability by processing each sample only once, without having access to task or stream cues at any point in time since this is more realistic compared to offline setups, where all data from novel class(es) is assumed to be readily available. However, existing methods typically rely on storing the subsets of data in memory or expanding the initial model architecture, resulting in significant computational overhead. Drawing inspiration from 'self-regulated neurogenesis'-brain's mechanism for creating specialized regions or circuits for distinct functions-we propose a novel approach SERENA which encodes each concept in a specialized network path called 'concept cell', integrated into a single over-parameterized network. Once a concept is learned, its corresponding concept cell is frozen, effectively preventing the forgetting of previously acquired information. Furthermore, we introduce two new continual learning scenarios that more closely reflect real-world conditions, characterized by gradually changing sample sizes. Experimental results show that our method not only establishes new state-of-the-art results across ten benchmarks but also remarkably surpasses offline supervised batch learning performance. The code is available at https://github.com/muratonuryildirim/serena.

LGNov 2, 2021Code
From Strings to Data Science: a Practical Framework for Automated String Handling

John W. van Lith, Joaquin Vanschoren

Many machine learning libraries require that string features be converted to a numerical representation for the models to work as intended. Categorical string features can represent a wide variety of data (e.g., zip codes, names, marital status), and are notoriously difficult to preprocess automatically. In this paper, we propose a framework to do so based on best practices, domain knowledge, and novel techniques. It automatically identifies different types of string features, processes them accordingly, and encodes them into numerical representations. We also provide an open source Python implementation to automatically preprocess categorical string data in tabular datasets and demonstrate promising results on a wide range of datasets.

CVJul 13, 2021Code
Cats, not CAT scans: a study of dataset similarity in transfer learning for 2D medical image classification

Irma van den Brandt, Floris Fok, Bas Mulders et al.

Transfer learning is a commonly used strategy for medical image classification, especially via pretraining on source data and fine-tuning on target data. There is currently no consensus on how to choose appropriate source data, and in the literature we can find both evidence of favoring large natural image datasets such as ImageNet, and evidence of favoring more specialized medical datasets. In this paper we perform a systematic study with nine source datasets with natural or medical images, and three target medical datasets, all with 2D images. We find that ImageNet is the source leading to the highest performances, but also that larger datasets are not necessarily better. We also study different definitions of data similarity. We show that common intuitions about similarity may be inaccurate, and therefore not sufficient to predict an appropriate source a priori. Finally, we discuss several steps needed for further research in this field, especially with regard to other types (for example 3D) medical images. Our experiments and pretrained models are available via \url{https://www.github.com/vcheplygina/cats-scans}

LGNov 6, 2019Code
OpenML-Python: an extensible Python API for OpenML

Matthias Feurer, Jan N. van Rijn, Arlind Kadra et al.

OpenML is an online platform for open science collaboration in machine learning, used to share datasets and results of machine learning experiments. In this paper we introduce OpenML-Python, a client API for Python, opening up the OpenML platform for a wide range of Python-based tools. It provides easy access to all datasets, tasks and experiments on OpenML from within Python. It also provides functionality to conduct machine learning experiments, upload the results to OpenML, and reproduce results which are stored on OpenML. Furthermore, it comes with a scikit-learn plugin and a plugin mechanism to easily integrate other machine learning libraries written in Python into the OpenML ecosystem. Source code and documentation is available at https://github.com/openml/openml-python/.

LGJul 1, 2019Code
An Open Source AutoML Benchmark

Pieter Gijsbers, Erin LeDell, Janek Thomas et al.

In recent years, an active field of research has developed around automated machine learning (AutoML). Unfortunately, comparing different AutoML systems is hard and often done incorrectly. We introduce an open, ongoing, and extensible benchmark framework which follows best practices and avoids common mistakes. The framework is open-source, uses public datasets and has a website with up-to-date results. We use the framework to conduct a thorough comparison of 4 AutoML systems across 39 datasets and analyze the results.

LGMar 28, 2024
Croissant: A Metadata Format for ML-Ready Datasets

Mubashara Akhtar, Omar Benjelloun, Costanza Conforti et al.

Data is a critical resource for machine learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that creates a shared representation across ML tools, frameworks, and platforms. Croissant makes datasets more discoverable, portable, and interoperable, thereby addressing significant challenges in ML data management. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, enabling easy loading into the most commonly-used ML frameworks, regardless of where the data is stored. Our initial evaluation by human raters shows that Croissant metadata is readable, understandable, complete, yet concise.

CLApr 18, 2024
Introducing v0.5 of the AI Safety Benchmark from MLCommons

Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed et al. · deepmind, oxford

This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.

LGJul 23, 2024
Can time series forecasting be automated? A benchmark and analysis

Anvitha Thirthapura Sreedhara, Joaquin Vanschoren

In the field of machine learning and artificial intelligence, time series forecasting plays a pivotal role across various domains such as finance, healthcare, and weather. However, the task of selecting the most suitable forecasting method for a given dataset is a complex task due to the diversity of data patterns and characteristics. This research aims to address this challenge by proposing a comprehensive benchmark for evaluating and ranking time series forecasting methods across a wide range of datasets. This study investigates the comparative performance of many methods from two prominent time series forecasting frameworks, AutoGluon-Timeseries, and sktime to shed light on their applicability in different real-world scenarios. This research contributes to the field of time series forecasting by providing a robust benchmarking methodology and facilitating informed decision-making when choosing forecasting methods for achieving optimal prediction.

CYFeb 19, 2025
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

Shaona Ghosh, Heather Frase, Adina Williams et al. · deepmind, stanford

The rapid advancement and deployment of AI systems have created an urgent need for standard safety-evaluation frameworks. This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. Its development employed an open process that included participants from multiple fields. The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories, including violent crimes, nonviolent crimes, sex-related crimes, child sexual exploitation, indiscriminate weapons, suicide and self-harm, intellectual property, privacy, defamation, hate, sexual content, and specialized advice (election, financial, health, legal). Our method incorporates a complete assessment standard, extensive prompt datasets, a novel evaluation framework, a grading and reporting system, and the technical as well as organizational infrastructure for long-term support and evolution. In particular, the benchmark employs an understandable five-tier grading scale (Poor to Excellent) and incorporates an innovative entropy-based system-response evaluation. In addition to unveiling the benchmark, this report also identifies limitations of our method and of building safety benchmarks generally, including evaluator uncertainty and the constraints of single-turn interactions. This work represents a crucial step toward establishing global standards for AI risk and reliability evaluation while acknowledging the need for continued development in areas such as multiturn interactions, multimodal understanding, coverage of additional languages, and emerging hazard categories. Our findings provide valuable insights for model developers, system integrators, and policymakers working to promote safer AI deployment.

LGFeb 5, 2024
Automatic Combination of Sample Selection Strategies for Few-Shot Learning

Branislav Pecher, Ivan Srba, Maria Bielikova et al.

In few-shot learning, such as meta-learning, few-shot fine-tuning or in-context learning, the limited number of samples used to train a model have a significant impact on the overall success. Although a large number of sample selection strategies exist, their impact on the performance of few-shot learning is not extensively known, as most of them have been so far evaluated in typical supervised settings only. In this paper, we thoroughly investigate the impact of 20 sample selection strategies on the performance of 5 few-shot learning approaches over 8 image and 6 text datasets. In addition, we propose a new method for automatic combination of sample selection strategies (ACSESS) that leverages the strengths and complementary information of the individual strategies. The experimental results show that our method consistently outperforms the individual selection strategies, as well as the recently proposed method for selecting support examples for in-context learning. We also show a strong modality, dataset and approach dependence for the majority of strategies as well as their dependence on the number of shots - demonstrating that the sample selection strategies play a significant role for lower number of shots, but regresses to random selection at higher number of shots.

LGFeb 20, 2025
Sculpting [CLS] Features for Pre-Trained Model-Based Class-Incremental Learning

Murat Onur Yildirim, Elif Ceren Gok Yildirim, Joaquin Vanschoren

Class-incremental learning requires models to continually acquire knowledge of new classes without forgetting old ones. Although pre-trained models have demonstrated strong performance in class-incremental learning, they remain susceptible to catastrophic forgetting when learning new concepts. Excessive plasticity in the models breaks generalizability and causes forgetting, while strong stability results in insufficient adaptation to new classes. This necessitates effective adaptation with minimal modifications to preserve the general knowledge of pre-trained models. To address this challenge, we first introduce a new parameter-efficient fine-tuning module 'Learn and Calibrate', or LuCA, designed to acquire knowledge through an adapter-calibrator couple, enabling effective adaptation with well-refined feature representations. Second, for each learning session, we deploy a sparse LuCA module on top of the last token just before the classifier, which we refer to as 'Token-level Sparse Calibration and Adaptation', or TOSCA. This strategic design improves the orthogonality between the modules and significantly reduces both training and inference complexity. By leaving the generalization capabilities of the pre-trained models intact and adapting exclusively via the last token, our approach achieves a harmonious balance between stability and plasticity. Extensive experiments demonstrate TOSCA's state-of-the-art performance while introducing ~8 times fewer parameters compared to prior methods.

LGOct 8, 2025
Automated Machine Learning for Unsupervised Tabular Tasks

Prabhant Singh, Pieter Gijsbers, Elif Ceren Gok Yildirim et al.

In this work, we present LOTUS (Learning to Learn with Optimal Transport for Unsupervised Scenarios), a simple yet effective method to perform model selection for multiple unsupervised machine learning(ML) tasks such as outlier detection and clustering. Our intuition behind this work is that a machine learning pipeline will perform well in a new dataset if it previously worked well on datasets with a similar underlying data distribution. We use Optimal Transport distances to find this similarity between unlabeled tabular datasets and recommend machine learning pipelines with one unified single method on two downstream unsupervised tasks: outlier detection and clustering. We present the effectiveness of our approach with experiments against strong baselines and show that LOTUS is a very promising first step toward model selection for multiple unsupervised ML tasks.

LGOct 7, 2025
How NOT to benchmark your SITE metric: Beyond Static Leaderboards and Towards Realistic Evaluation

Prabhant Singh, Sibylle Hess, Joaquin Vanschoren

Transferability estimation metrics are used to find a high-performing pre-trained model for a given target task without fine-tuning models and without access to the source dataset. Despite the growing interest in developing such metrics, the benchmarks used to measure their progress have gone largely unexamined. In this work, we empirically show the shortcomings of widely used benchmark setups to evaluate transferability estimation metrics. We argue that the benchmarks on which these metrics are evaluated are fundamentally flawed. We empirically demonstrate that their unrealistic model spaces and static performance hierarchies artificially inflate the perceived performance of existing metrics, to the point where simple, dataset-agnostic heuristics can outperform sophisticated methods. Our analysis reveals a critical disconnect between current evaluation protocols and the complexities of real-world model selection. To address this, we provide concrete recommendations for constructing more robust and realistic benchmarks to guide future research in a more meaningful direction.

COAug 23, 2025
Score Matching on Large Geometric Graphs for Cosmology Generation

Diana-Alexandra Onutu, Yue Zhao, Joaquin Vanschoren et al.

Generative models are a promising tool to produce cosmological simulations but face significant challenges in scalability, physical consistency, and adherence to domain symmetries, limiting their utility as alternatives to $N$-body simulations. To address these limitations, we introduce a score-based generative model with an equivariant graph neural network that simulates gravitational clustering of galaxies across cosmologies starting from an informed prior, respects periodic boundaries, and scales to full galaxy counts in simulations. A novel topology-aware noise schedule, crucial for large geometric graphs, is introduced. The proposed equivariant score-based model successfully generates full-scale cosmological point clouds of up to 600,000 halos, respects periodicity and a uniform prior, and outperforms existing diffusion models in capturing clustering statistics while offering significant computational advantages. This work advances cosmology by introducing a generative model designed to closely resemble the underlying gravitational clustering of structure formation, moving closer to physically realistic and efficient simulators for the evolution of large-scale structures in the universe.

LGJul 7, 2025
Meta-Learning Transformers to Improve In-Context Generalization

Lorenzo Braccaioli, Anna Vettoruzzo, Prabhant Singh et al.

In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on large, unstructured datasets that are costly to store, difficult to evaluate for quality and balance, and pose privacy and ethical concerns due to the inclusion of sensitive information. Motivated by these limitations and risks, we propose an alternative training strategy where we leverage a collection of multiple, small-scale, and domain-specific datasets. We empirically demonstrate that the increased quality and diversity of such data improve the generalization abilities of in-context learners beyond their training domain, while achieving comparable performance with models trained on a single large-scale dataset. We investigate this paradigm by leveraging meta-learning to train an in-context learner on the Meta-Album collection under several settings. Firstly, we show the performance in a controlled environment, where the test domain is completely excluded from the training knowledge. Secondly, we explore the robustness of these models to forgetting in a continual scenario where the information is accessible for a limited time. Finally, we explore the more challenging unsupervised scenario. Our findings demonstrate that transformers still generalize for in-context prediction when trained on a curated dataset collection while offering advantages in modularity and replaceability.

LGMay 23, 2025
Evolving Machine Learning: A Survey

Ignacio Cabrera Martin, Subhaditya Mukherjee, Almas Baimagambetov et al.

In an era defined by rapid data evolution, traditional Machine Learning (ML) models often fall short in adapting to dynamic environments. Evolving Machine Learning (EML) has emerged as a critical paradigm, enabling continuous learning and adaptation in real-time data streams. This survey presents a comprehensive analysis of EML, focusing on five core challenges: data drift, concept drift, catastrophic forgetting, skewed learning, and network adaptation. We systematically review over 100 studies, categorizing state-of-the-art methods across supervised, unsupervised, and semi-supervised approaches. The survey explores diverse evaluation metrics, benchmark datasets, and real-world applications, offering a comparative lens on the effectiveness and limitations of current techniques. Additionally, we highlight the growing role of adaptive neural architectures, meta-learning, and ensemble strategies in addressing evolving data complexities. By synthesizing insights from recent literature, this work not only maps the current landscape of EML but also identifies critical gaps and opportunities for future research. Our findings aim to guide researchers and practitioners in developing robust, ethical, and scalable EML systems for real-world deployment.

LGApr 1, 2025
AutoML Benchmark with shorter time constraints and early stopping

Israel Campero Jurado, Pieter Gijsbers, Joaquin Vanschoren

Automated Machine Learning (AutoML) automatically builds machine learning (ML) models on data. The de facto standard for evaluating new AutoML frameworks for tabular data is the AutoML Benchmark (AMLB). AMLB proposed to evaluate AutoML frameworks using 1- and 4-hour time budgets across 104 tasks. We argue that shorter time constraints should be considered for the benchmark because of their practical value, such as when models need to be retrained with high frequency, and to make AMLB more accessible. This work considers two ways in which to reduce the overall computation used in the benchmark: smaller time constraints and the use of early stopping. We conduct evaluations of 11 AutoML frameworks on 104 tasks with different time constraints and find the relative ranking of AutoML frameworks is fairly consistent across time constraints, but that using early-stopping leads to a greater variety in model performance.

LGFeb 10, 2025
Occam's model: Selecting simpler representations for better transferability estimation

Prabhant Singh, Sibylle Hess, Joaquin Vanschoren

Fine-tuning models that have been pre-trained on large datasets has become a cornerstone of modern machine learning workflows. With the widespread availability of online model repositories, such as Hugging Face, it is now easier than ever to fine-tune pre-trained models for specific tasks. This raises a critical question: which pre-trained model is most suitable for a given task? This problem is called transferability estimation. In this work, we introduce two novel and effective metrics for estimating the transferability of pre-trained models. Our approach is grounded in viewing transferability as a measure of how easily a pre-trained model's representations can be trained to separate target classes, providing a unique perspective on transferability estimation. We rigorously evaluate the proposed metrics against state-of-the-art alternatives across diverse problem settings, demonstrating their robustness and practical utility. Additionally, we present theoretical insights that explain our metrics' efficacy and adaptability to various scenarios. We experimentally show that our metrics increase Kendall's Tau by up to 32% compared to the state-of-the-art baselines.

LGOct 23, 2024
Continual Learning on a Data Diet

Elif Ceren Gok Yildirim, Murat Onur Yildirim, Joaquin Vanschoren

Continual Learning (CL) methods usually learn from all available data. However, this is not the case in human cognition which efficiently focuses on key experiences while disregarding the redundant information. Similarly, not all data points in a dataset have equal potential; some can be more informative than others. This disparity may significantly impact the performance, as both the quality and quantity of samples directly influence the model's generalizability and efficiency. Drawing inspiration from this, we explore the potential of learning from important samples and present an empirical study for evaluating coreset selection techniques in the context of CL to stimulate research in this unexplored area. We train different continual learners on increasing amounts of selected samples and investigate the learning-forgetting dynamics by shedding light on the underlying mechanisms driving their improved stability-plasticity balance. We present several significant observations: learning from selectively chosen samples (i) enhances incremental accuracy, (ii) improves knowledge retention of previous tasks, and (iii) refines learned representations. This analysis contributes to a deeper understanding of selective learning strategies in CL scenarios.

IRJun 4, 2024
A Standardized Machine-readable Dataset Documentation Format for Responsible AI

Nitisha Jain, Mubashara Akhtar, Joan Giner-Miguelez et al.

Data is critical to advancing AI technologies, yet its quality and documentation remain significant challenges, leading to adverse downstream effects (e.g., potential biases) in AI applications. This paper addresses these issues by introducing Croissant-RAI, a machine-readable metadata format designed to enhance the discoverability, interoperability, and trustworthiness of AI datasets. Croissant-RAI extends the Croissant metadata format and builds upon existing responsible AI (RAI) documentation frameworks, offering a standardized set of attributes and practices to facilitate community-wide adoption. Leveraging established web-publishing practices, such as Schema.org, Croissant-RAI enables dataset users to easily find and utilize RAI metadata regardless of the platform on which the datasets are published. Furthermore, it is seamlessly integrated into major data search engines, repositories, and machine learning frameworks, streamlining the reading and writing of responsible AI metadata within practitioners' existing workflows. Croissant-RAI was developed through a community-led effort. It has been designed to be adaptable to evolving documentation requirements and is supported by a Python library and a visual editor.

CVSep 4, 2023
Locality-Aware Hyperspectral Classification

Fangqin Zhou, Mert Kilickaya, Joaquin Vanschoren

Hyperspectral image classification is gaining popularity for high-precision vision tasks in remote sensing, thanks to their ability to capture visual information available in a wide continuum of spectra. Researchers have been working on automating Hyperspectral image classification, with recent efforts leveraging Vision-Transformers. However, most research models only spectra information and lacks attention to the locality (i.e., neighboring pixels), which may be not sufficiently discriminative, resulting in performance limitations. To address this, we present three contributions: i) We introduce the Hyperspectral Locality-aware Image TransformEr (HyLITE), a vision transformer that models both local and spectral information, ii) A novel regularization function that promotes the integration of local-to-global information, and iii) Our proposed approach outperforms competing baselines by a significant margin, achieving up to 10% gains in accuracy. The trained models and the code are available at HyLITE.

CVFeb 1, 2022
Advances in MetaDL: AAAI 2021 challenge and workshop

Adrian El Baz, Isabelle Guyon, Zhengying Liu et al.

To stimulate advances in metalearning using deep learning techniques (MetaDL), we organized in 2021 a challenge and an associated workshop. This paper presents the design of the challenge and its results, and summarizes presentations made at the workshop. The challenge focused on few-shot learning classification tasks of small images. Participants' code submissions were run in a uniform manner, under tight computational constraints. This put pressure on solution designs to use existing architecture backbones and/or pre-trained networks. Winning methods featured various classifiers trained on top of the second last layer of popular CNN backbones, fined-tuned on the meta-training data (not necessarily in an episodic manner), then trained on the labeled support and tested on the unlabeled query sets of the meta-test data.

LGJan 24, 2022
Online AutoML: An adaptive AutoML framework for online learning

Bilge Celik, Prabhant Singh, Joaquin Vanschoren

Automated Machine Learning (AutoML) has been used successfully in settings where the learning task is assumed to be static. In many real-world scenarios, however, the data distribution will evolve over time, and it is yet to be shown whether AutoML techniques can effectively design online pipelines in dynamic environments. This study aims to automate pipeline design for online learning while continuously adapting to data drift. For this purpose, we design an adaptive Online Automated Machine Learning (OAML) system, searching the complete pipeline configuration space of online learners, including preprocessing algorithms and ensembling techniques. This system combines the inherent adaptation capabilities of online learners with the fast automated pipeline (re)optimization capabilities of AutoML. Focusing on optimization techniques that can adapt to evolving objectives, we evaluate asynchronous genetic programming and asynchronous successive halving to optimize these pipelines continually. We experiment on real and artificial data streams with varying types of concept drift to test the performance and adaptation capabilities of the proposed system. The results confirm the utility of OAML over popular online learning algorithms and underscore the benefits of continuous pipeline redesign in the presence of data drift.