Katarzyna Woźnica

h-index8

12papers

53citations

Novelty43%

AI Score40

Ranked #75,910 of 194,257 authors (top 39%)#16,948 in LG (top 42%)

12 Papers

3.8LGJun 20, 2023Code

SeFNet: Bridging Tabular Datasets with Semantic Feature Nets

Katarzyna Woźnica, Piotr Wilczyński, Przemysław Biecek

Machine learning applications cover a wide range of predictive tasks in which tabular datasets play a significant role. However, although they often address similar problems, tabular datasets are typically treated as standalone tasks. The possibilities of using previously solved problems are limited due to the lack of structured contextual information about their features and the lack of understanding of the relations between them. To overcome this limitation, we propose a new approach called Semantic Feature Net (SeFNet), capturing the semantic meaning of the analyzed tabular features. By leveraging existing ontologies and domain knowledge, SeFNet opens up new opportunities for sharing insights between diverse predictive tasks. One such opportunity is the Dataset Ontology-based Semantic Similarity (DOSS) measure, which quantifies the similarity between datasets using relations across their features. In this paper, we present an example of SeFNet prepared for a collection of predictive tasks in healthcare, with the features' relations derived from the SNOMED-CT ontology. The proposed SeFNet framework and the accompanying DOSS measure address the issue of limited contextual information in tabular datasets. By incorporating domain knowledge and establishing semantic relations between features, we enhance the potential for meta-learning and enable valuable insights to be shared across different predictive tasks.

9.4LGJun 25, 2025

Divide, Specialize, and Route: A New Approach to Efficient Ensemble Learning

Jakub Piwko, Jędrzej Ruciński, Dawid Płudowski et al.

Ensemble learning has proven effective in boosting predictive performance, but traditional methods such as bagging, boosting, and dynamic ensemble selection (DES) suffer from high computational cost and limited adaptability to heterogeneous data distributions. To address these limitations, we propose Hellsemble, a novel and interpretable ensemble framework for binary classification that leverages dataset complexity during both training and inference. Hellsemble incrementally partitions the dataset into circles of difficulty by iteratively passing misclassified instances from simpler models to subsequent ones, forming a committee of specialised base learners. Each model is trained on increasingly challenging subsets, while a separate router model learns to assign new instances to the most suitable base model based on inferred difficulty. Hellsemble achieves strong classification accuracy while maintaining computational efficiency and interpretability. Experimental results on OpenML-CC18 and Tabzilla benchmarks demonstrate that Hellsemble often outperforms classical ensemble methods. Our findings suggest that embracing instance-level difficulty offers a promising direction for constructing efficient and robust ensemble systems.

4.1LGJul 16, 2025

Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?

Antoni Zajko, Katarzyna Woźnica

Effectively representing heterogeneous tabular datasets for meta-learning purposes is still an open problem. Previous approaches rely on representations that are intended to be universal. This paper proposes two novel methods for tabular representation learning tailored to a specific meta-task - warm-starting Bayesian Hyperparameter Optimization. Both follow the specific requirement formulated by ourselves that enforces representations to capture the properties of landmarkers. The first approach involves deep metric learning, while the second one is based on landmarkers reconstruction. We evaluate the proposed encoders in two ways. Next to the gain in the target meta-task, we also use the degree of fulfillment of the proposed requirement as the evaluation metric. Experiments demonstrate that while the proposed encoders can effectively learn representations aligned with landmarkers, they may not directly translate to significant performance gains in the meta-task of HPO warm-starting.

2.6LGMar 19, 2024

Deciphering AutoML Ensembles: cattleia's Assistance in Decision-Making

Anna Kozak, Dominik Kędzierski, Jakub Piwko et al.

In many applications, model ensembling proves to be better than a single predictive model. Hence, it is the most common post-processing technique in Automated Machine Learning (AutoML). The most popular frameworks use ensembles at the expense of reducing the interpretability of the final models. In our work, we propose cattleia - an application that deciphers the ensembles for regression, multiclass, and binary classification tasks. This tool works with models built by three AutoML packages: auto-sklearn, AutoGluon, and FLAML. The given ensemble is analyzed from different perspectives. We conduct a predictive performance investigation through evaluation metrics of the ensemble and its component models. We extend the validation perspective by introducing new measures to assess the diversity and complementarity of the model predictions. Moreover, we apply explainable artificial intelligence (XAI) techniques to examine the importance of variables. Summarizing obtained insights, we can investigate and adjust the weights with a modification tool to tune the ensemble in the desired way. The application provides the aforementioned aspects through dedicated interactive visualizations, making it accessible to a diverse audience. We believe the cattleia can support users in decision-making and deepen the comprehension of AutoML frameworks.

2.6LGMar 7, 2024Code

Rethinking of Encoder-based Warm-start Methods in Hyperparameter Optimization

Dawid Płudowski, Antoni Zajko, Anna Kozak et al.

Effectively representing heterogeneous tabular datasets for meta-learning purposes remains an open problem. Previous approaches rely on predefined meta-features, for example, statistical measures or landmarkers. The emergence of dataset encoders opens new possibilities for the extraction of meta-features because they do not involve any handmade design. Moreover, they are proven to generate dataset representations with desired spatial properties. In this research, we evaluate an encoder-based approach to one of the most established meta-tasks - warm-starting of the Bayesian Hyperparameter Optimization. To broaden our analysis we introduce a new approach for representation learning on tabular data based on [Tomoharu Iwata and Atsutoshi Kumagai. Meta-learning from Tasks with Heterogeneous Attribute Spaces. In Advances in Neural Information Processing Systems, 2020]. The validation on over 100 datasets from UCI and an independent metaMIMIC set of datasets highlights the nuanced challenges in representation learning. We show that general representations may not suffice for some meta-tasks where requirements are not explicitly considered during extraction.

4.6LGJan 27, 2022Code

Consolidated learning -- a domain-specific model-free optimization strategy with examples for XGBoost and MIMIC-IV

Katarzyna Woźnica, Mateusz Grzyb, Zuzanna Trafas et al.

For many machine learning models, a choice of hyperparameters is a crucial step towards achieving high performance. Prevalent meta-learning approaches focus on obtaining good hyperparameters configurations with a limited computational budget for a completely new task based on the results obtained from the prior tasks. This paper proposes a new formulation of the tuning problem, called consolidated learning, more suited to practical challenges faced by model developers, in which a large number of predictive models are created on similar data sets. In such settings, we are interested in the total optimization time rather than tuning for a single task. We show that a carefully selected static portfolio of hyperparameters yields good results for anytime optimization, maintaining ease of use and implementation. Moreover, we point out how to construct such a portfolio for specific domains. The improvement in the optimization is possible due to more efficient transfer of hyperparameter configurations between similar tasks. We demonstrate the effectiveness of this approach through an empirical study for XGBoost algorithm and the collection of predictive tasks extracted from the MIMIC-IV medical database; however, consolidated learning is applicable in many others fields.

1.6LGMay 28, 2021

Do not explain without context: addressing the blind spot of model explanations

Katarzyna Woźnica, Katarzyna Pękala, Hubert Baniecki et al.

The increasing number of regulations and expectations of predictive machine learning models, such as so called right to explanation, has led to a large number of methods promising greater interpretability. High demand has led to a widespread adoption of XAI techniques like Shapley values, Partial Dependence profiles or permutational variable importance. However, we still do not know enough about their properties and how they manifest in the context in which explanations are created by analysts, reviewed by auditors, and interpreted by various stakeholders. This paper highlights a blind spot which, although critical, is often overlooked when monitoring and auditing machine learning models: the effect of the reference data on the explanation calculation. We discuss that many model explanations depend directly or indirectly on the choice of the referenced data distribution. We showcase examples where small changes in the distribution lead to drastic changes in the explanations, such as a change in trend or, alarmingly, a conclusion. Consequently, we postulate that obtaining robust and useful explanations always requires supporting them with a broader context.

5.5LGApr 7, 2021Code

Triplot: model agnostic measures and visualisations for variable importance in predictive models that take into account the hierarchical correlation structure

Katarzyna Pekala, Katarzyna Woznica, Przemyslaw Biecek

One of the key elements of explanatory analysis of a predictive model is to assess the importance of individual variables. Rapid development of the area of predictive model exploration (also called explainable artificial intelligence or interpretable machine learning) has led to the popularization of methods for local (instance level) and global (dataset level) methods, such as Permutational Variable Importance, Shapley Values (SHAP), Local Interpretable Model Explanations (LIME), Break Down and so on. However, these methods do not use information about the correlation between features which significantly reduce the explainability of the model behaviour. In this work, we propose new methods to support model analysis by exploiting the information about the correlation between variables. The dataset level aspect importance measure is inspired by the block permutations procedure, while the instance level aspect importance measure is inspired by the LIME method. We show how to analyze groups of variables (aspects) both when they are proposed by the user and when they should be determined automatically based on the hierarchical structure of correlations between variables. Additionally, we present the new type of model visualisation, triplot, which exploits a hierarchical structure of variable grouping to produce a high information density model visualisation. This visualisation provides a consistent illustration for either local or global model and data exploration. We also show an example of real-world data with 5k instances and 37 features in which a significant correlation between variables affects the interpretation of the effect of variable importance. The proposed method is, to our knowledge, the first to allow direct use of the correlation between variables in exploratory model analysis.

8.3MLJul 6, 2020

Does imputation matter? Benchmark for predictive models

Katarzyna Woźnica, Przemysław Biecek

Incomplete data are common in practical applications. Most predictive machine learning models do not handle missing values so they require some preprocessing. Although many algorithms are used for data imputation, we do not understand the impact of the different methods on the predictive models' performance. This paper is first that systematically evaluates the empirical effectiveness of data imputation algorithms for predictive models. The main contributions are (1) the recommendation of a general method for empirical benchmarking based on real-life classification tasks and the (2) comparative analysis of different imputation methods for a collection of data sets and a collection of ML algorithms.

6.5LGJun 2, 2020Code

Interpretable Meta-Measure for Model Performance

Alicja Gosiewska, Katarzyna Woźnica, Przemysław Biecek

Benchmarks for the evaluation of model performance play an important role in machine learning. However, there is no established way to describe and create new benchmarks. What is more, the most common benchmarks use performance measures that share several limitations. For example, the difference in performance for two models has no probabilistic interpretation, there is no reference point to indicate whether they represent a significant improvement, and it makes no sense to compare such differences between data sets. We introduce a new meta-score assessment named Elo-based Predictive Power (EPP) that is built on top of other performance measures and allows for interpretable comparisons of models. The differences in EPP scores have a probabilistic interpretation and can be directly compared between data sets, furthermore, the logistic regression-based design allows for an assessment of ranking fitness based on a deviance statistic. We prove the mathematical properties of EPP and support them with empirical results of a large scale benchmark on 30 classification data sets and a real-world benchmark for visual data. Additionally, we propose a Unified Benchmark Ontology that is used to give a uniform description of benchmarks.

4.9MLFeb 11, 2020

Towards explainable meta-learning

Katarzyna Woźnica, Przemysław Biecek

Meta-learning is a field that aims at discovering how different machine learning algorithms perform on a wide range of predictive tasks. Such knowledge speeds up the hyperparameter tuning or feature engineering. With the use of surrogate models various aspects of the predictive task such as meta-features, landmarker models e.t.c. are used to predict the expected performance. State of the art approaches are focused on searching for the best meta-model but do not explain how these different aspects contribute to its performance. However, to build a new generation of meta-models we need a deeper understanding of the importance and effect of meta-features on the model tunability. In this paper, we propose techniques developed for eXplainable Artificial Intelligence (XAI) to examine and extract knowledge from black-box surrogate models. To our knowledge, this is the first paper that shows how post-hoc explainability can be used to improve the meta-learning.

1.8LGAug 24, 2019

EPP: interpretable score of model predictive power

Alicja Gosiewska, Mateusz Bakala, Katarzyna Woznica et al.

The most important part of model selection and hyperparameter tuning is the evaluation of model performance. The most popular measures, such as AUC, F1, ACC for binary classification, or RMSE, MAD for regression, or cross-entropy for multilabel classification share two common weaknesses. First is, that they are not on an interval scale. It means that the difference in performance for the two models has no direct interpretation. It makes no sense to compare such differences between datasets. Second is, that for k-fold cross-validation, the model performance is in most cases calculated as an average performance from particular folds, which neglects the information how stable is the performance for different folds. In this talk, we introduce a new EPP rating system for predictive models. We also demonstrate numerous advantages for this system, First, differences in EPP scores have probabilistic interpretation. Based on it we can assess the probability that one model will achieve better performance than another. Second, EPP scores can be directly compared between datasets. Third, they can be used for navigated hyperparameter tuning and model selection. Forth, we can create embeddings for datasets based on EPP scores.