Krzysztof Kotowski

LG
h-index9
8papers
79citations
Novelty33%
AI Score40

8 Papers

IVSep 3, 2022
Deep learning automates bidimensional and volumetric tumor burden measurement from MRI in pre- and post-operative glioblastoma patients

Jakub Nalepa, Krzysztof Kotowski, Bartosz Machura et al.

Tumor burden assessment by magnetic resonance imaging (MRI) is central to the evaluation of treatment response for glioblastoma. This assessment is complex to perform and associated with high variability due to the high heterogeneity and complexity of the disease. In this work, we tackle this issue and propose a deep learning pipeline for the fully automated end-to-end analysis of glioblastoma patients. Our approach simultaneously identifies tumor sub-regions, including the enhancing tumor, peritumoral edema and surgical cavity in the first step, and then calculates the volumetric and bidimensional measurements that follow the current Response Assessment in Neuro-Oncology (RANO) criteria. Also, we introduce a rigorous manual annotation process which was followed to delineate the tumor sub-regions by the human experts, and to capture their segmentation confidences that are later used while training the deep learning models. The results of our extensive experimental study performed over 760 pre-operative and 504 post-operative adult patients with glioma obtained from the public database (acquired at 19 sites in years 2021-2020) and from a clinical treatment trial (47 and 69 sites for pre-/post-operative patients, 2009-2011) and backed up with thorough quantitative, qualitative and statistical analysis revealed that our pipeline performs accurate segmentation of pre- and post-operative MRIs in a fraction of the manual delineation time (up to 20 times faster than humans). The bidimensional and volumetric measurements were in strong agreement with expert radiologists, and we showed that RANO measurements are not always sufficient to quantify tumor burden.

LGMar 20
Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition

Krzysztof Kotowski, Ramez Shendy, Jakub Nalepa et al.

Forecasting plays a crucial role in modern safety-critical applications, such as space operations. However, the increasing use of deep forecasting models introduces a new security risk of trojan horse attacks, carried out by hiding a backdoor in the training data or directly in the model weights. Once implanted, the backdoor is activated by a specific trigger pattern at test time, causing the model to produce manipulated predictions. We focus on this issue in our \textit{Trojan Horse Hunt} data science competition, where more than 200 teams faced the task of identifying triggers hidden in deep forecasting models for spacecraft telemetry. We describe the novel task formulation, benchmark set, evaluation protocol, and best solutions from the competition. We further summarize key insights and research directions for effective identification of triggers in time series forecasting models. All materials are publicly available on the official competition webpage https://www.kaggle.com/competitions/trojan-horse-hunt-in-space.

LGMar 28, 2025
MASCOTS: Model-Agnostic Symbolic COunterfactual explanations for Time Series

Dawid Płudowski, Francesco Spinnato, Piotr Wilczyński et al.

Counterfactual explanations provide an intuitive way to understand model decisions by identifying minimal changes required to alter an outcome. However, applying counterfactual methods to time series models remains challenging due to temporal dependencies, high dimensionality, and the lack of an intuitive human-interpretable representation. We introduce MASCOTS, a method that leverages the Bag-of-Receptive-Fields representation alongside symbolic transformations inspired by Symbolic Aggregate Approximation. By operating in a symbolic feature space, it enhances interpretability while preserving fidelity to the original data and model. Unlike existing approaches that either depend on model structure or autoencoder-based sampling, MASCOTS directly generates meaningful and diverse counterfactual observations in a model-agnostic manner, operating on both univariate and multivariate data. We evaluate MASCOTS on univariate and multivariate benchmark datasets, demonstrating comparable validity, proximity, and plausibility to state-of-the-art methods, while significantly improving interpretability and sparsity. Its symbolic nature allows for explanations that can be expressed visually, in natural language, or through semantic representations, making counterfactual reasoning more accessible and actionable.

LGApr 11, 2024
DisorderUnetLM: Validating ProteinUnet for efficient protein intrinsic disorder prediction

Krzysztof Kotowski, Irena Roterman, Katarzyna Stapor

The prediction of intrinsic disorder regions has significant implications for understanding protein functions and dynamics. It can help to discover novel protein-protein interactions essential for designing new drugs and enzymes. Recently, a new generation of predictors based on protein language models (pLMs) is emerging. These algorithms reach state-of-the-art accuracy with-out calculating time-consuming multiple sequence alignments (MSAs). The article introduces the new DisorderUnetLM disorder predictor, which builds upon the idea of ProteinUnet. It uses the Attention U-Net convolutional neural network and incorporates features from the ProtTrans pLM. DisorderUnetLM achieves top results in the direct comparison with recent predictors exploiting MSAs and pLMs. Moreover, among 43 predictors from the latest CAID-2 benchmark, it ranks 1st for the Disorder-NOX subset (ROC-AUC of 0.844) and 10th for the Disorder-PDB subset (ROC-AUC of 0.924). The code and model are publicly available and fully reproducible at doi.org/10.24433/CO.7350682.v1.

LGJul 17, 2025
Fake or Real: The Impostor Hunt in Texts for Space Operations

Agata Kaczmarek, Dawid Płudowski, Piotr Wilczyński et al.

The "Fake or Real" competition hosted on Kaggle (https://www.kaggle.com/competitions/fake-or-real-the-impostor-hunt ) is the second part of a series of follow-up competitions and hackathons related to the "Assurance for Space Domain AI Applications" project funded by the European Space Agency (https://assurance-ai.space-codev.org/ ). The competition idea is based on two real-life AI security threats identified within the project -- data poisoning and overreliance in Large Language Models. The task is to distinguish between the proper output from LLM and the output generated under malicious modification of the LLM. As this problem was not extensively researched, participants are required to develop new techniques to address this issue or adjust already existing ones to this problem's statement.

LGJun 2, 2025
Trojan Horse Hunt in Time Series Forecasting for Space Operations

Krzysztof Kotowski, Ramez Shendy, Jakub Nalepa et al.

This competition hosted on Kaggle (https://www.kaggle.com/competitions/trojan-horse-hunt-in-space) is the first part of a series of follow-up competitions and hackathons related to the "Assurance for Space Domain AI Applications" project funded by the European Space Agency (https://assurance-ai.space-codev.org/). The competition idea is based on one of the real-life AI security threats identified within the project -- the adversarial poisoning of continuously fine-tuned satellite telemetry forecasting models. The task is to develop methods for finding and reconstructing triggers (trojans) in advanced models for satellite telemetry forecasting used in safety-critical space operations. Participants are provided with 1) a large public dataset of real-life multivariate satellite telemetry (without triggers), 2) a reference model trained on the clean data, 3) a set of poisoned neural hierarchical interpolation (N-HiTS) models for time series forecasting trained on the dataset with injected triggers, and 4) Jupyter notebook with the training pipeline and baseline algorithm (the latter will be published in the last month of the competition). The main task of the competition is to reconstruct a set of 45 triggers (i.e., short multivariate time series segments) injected into the training data of the corresponding set of 45 poisoned models. The exact characteristics (i.e., shape, amplitude, and duration) of these triggers must be identified by participants. The popular Neural Cleanse method is adopted as a baseline, but it is not designed for time series analysis and new approaches are necessary for the task. The impact of the competition is not limited to the space domain, but also to many other safety-critical applications of advanced time series analysis where model poisoning may lead to serious consequences.

SPJun 29, 2024
The OPS-SAT benchmark for detecting anomalies in satellite telemetry

Bogdan Ruszczak, Krzysztof Kotowski, David Evans et al.

Detecting anomalous events in satellite telemetry is a critical task in space operations. This task, however, is extremely time-consuming, error-prone and human dependent, thus automated data-driven anomaly detection algorithms have been emerging at a steady pace. However, there are no publicly available datasets of real satellite telemetry accompanied with the ground-truth annotations that could be used to train and verify anomaly detection supervised models. In this article, we address this research gap and introduce the AI-ready benchmark dataset (OPSSAT-AD) containing the telemetry data acquired on board OPS-SAT -- a CubeSat mission which has been operated by the European Space Agency which has come to an end during the night of 22--23 May 2024 (CEST). The dataset is accompanied with the baseline results obtained using 30 supervised and unsupervised classic and deep machine learning algorithms for anomaly detection. They were trained and validated using the training-test dataset split introduced in this work, and we present a suggested set of quality metrics which should be always calculated to confront the new algorithms for anomaly detection while exploiting OPSSAT-AD. We believe that this work may become an important step toward building a fair, reproducible and objective validation procedure that can be used to quantify the capabilities of the emerging anomaly detection techniques in an unbiased and fully transparent way.

LGJun 25, 2024
European Space Agency Benchmark for Anomaly Detection in Satellite Telemetry

Krzysztof Kotowski, Christoph Haskamp, Jacek Andrzejewski et al.

Machine learning has vast potential to improve anomaly detection in satellite telemetry which is a crucial task for spacecraft operations. This potential is currently hampered by a lack of comprehensible benchmarks for multivariate time series anomaly detection, especially for the challenging case of satellite telemetry. The European Space Agency Benchmark for Anomaly Detection in Satellite Telemetry (ESA-ADB) aims to address this challenge and establish a new standard in the domain. It is a result of close cooperation between spacecraft operations engineers from the European Space Agency (ESA) and machine learning experts. The newly introduced ESA Anomalies Dataset contains annotated real-life telemetry from three different ESA missions, out of which two are included in ESA-ADB. Results of typical anomaly detection algorithms assessed in our novel hierarchical evaluation pipeline show that new approaches are necessary to address operators' needs. All elements of ESA-ADB are publicly available to ensure its full reproducibility.