Michał Cholewa

CV
h-index10
9papers
40citations
Novelty44%
AI Score38

9 Papers

CVApr 17
From Articles to Canopies: Knowledge-Driven Pseudo-Labelling for Tree Species Classification using LLM Experts

Michał Romaszewski, Dominik Kopeć, Michał Cholewa et al.

Hyperspectral tree species classification is challenging due to limited and imbalanced class labels, spectral mixing (overlapping light signatures from multiple species), and ecological heterogeneity (variability among ecological systems). Addressing these challenges requires methods that integrate biological and structural characteristics of vegetation, such as canopy architecture and interspecific interactions, rather than relying solely on spectral signatures. This paper presents a biologically informed, semi-supervised deep learning method that integrates multi-sensor Earth observation data, specifically hyperspectral imaging (HSI) and airborne laser scanning (ALS), with expert, ecological knowledge. The approach relies on biologically inspired pseudo-labelling over a precomputed canopy graph, yielding accurate classification at low training cost. In addition, ecological priors on species cohabitation are automatically derived from reliable sources using large language models (LLMs) and encoded as a cohabitation matrix with likelihoods of species occurring together. These priors are incorporated into the pseudo-labelling strategy, effectively introducing expert knowledge into the model. Experiments on a real-world forest dataset demonstrate 5.6% improvement over the best reference method. Expert evaluation of cohabitation priors reveals high accuracy with differences no larger than 15%.

SPSep 17, 2024
Efficient Numerical Calibration of Water Delivery Network Using Short-Burst Hydrant Trials

Katarzyna Kołodziej, Michał Cholewa, Przemysław Głomb et al.

Calibration is a critical process for reducing uncertainty in Water Distribution Network Hydraulic Models (WDN HM). However, features of certain WDNs, such as oversized pipelines, lead to shallow pressure gradients under normal daily conditions, posing a challenge for effective calibration. This study proposes a calibration methodology using short hydrant trials conducted at night, which increase the pressure gradient in the WDN. The data is resampled to align with hourly consumption patterns. In a unique real-world case study of a WDN zone, we demonstrate the statistically significant superiority of our method compared to calibration based on daily usage. The experimental methodology, inspired by a machine learning cross-validation framework, utilises two state-of-the-art calibration algorithms, achieving a reduction in absolute error of up to 45% in the best scenario.

QUANT-PHFeb 28, 2025
Quantum-aware Transformer model for state classification

Przemysław Sekuła, Michał Romaszewski, Przemysław Głomb et al.

Entanglement is a fundamental feature of quantum mechanics, playing a crucial role in quantum information processing. However, classifying entangled states, particularly in the mixed-state regime, remains a challenging problem, especially as system dimensions increase. In this work, we focus on bipartite quantum states and present a data-driven approach to entanglement classification using transformer-based neural networks. Our dataset consists of a diverse set of bipartite states, including pure separable states, Werner entangled states, general entangled states, and maximally entangled states. We pretrain the transformer in an unsupervised fashion by masking elements of vectorized Hermitian matrix representations of quantum states, allowing the model to learn structural properties of quantum density matrices. This approach enables the model to generalize entanglement characteristics across different classes of states. Once trained, our method achieves near-perfect classification accuracy, effectively distinguishing between separable and entangled states. Compared to previous Machine Learning, our method successfully adapts transformers for quantum state analysis, demonstrating their ability to systematically identify entanglement in bipartite systems. These results highlight the potential of modern machine learning techniques in automating entanglement detection and classification, bridging the gap between quantum information theory and artificial intelligence.

SYJun 28, 2024
`Just One More Sensor is Enough' -- Iterative Water Leak Localization with Physical Simulation and a Small Number of Pressure Sensors

Michał Cholewa, Michał Romaszewski, Przemysław Głomb et al.

In this article, we propose an approach to leak localisation in a complex water delivery grid with the use of data from physical simulation (e.g. EPANET software). This task is usually achieved by a network of multiple water pressure sensors and analysis of the so-called sensitivity matrix of pressure differences between the network's simulated data and actual data of the network affected by the leak. However, most algorithms using this approach require a significant number of pressure sensors -- a condition that is not easy to fulfil in the case of many less equipped networks. Therefore, we answer the question of whether leak localisation is possible by utilising very few sensors but having the ability to relocate one of them. Our algorithm is based on physical simulations (EPANET software) and an iterative scheme for mobile sensor relocation. The experiments show that the proposed system can equalise the low number of sensors with adjustments made for their positioning, giving a very good approximation of leak's position both in simulated cases and real-life example taken from BattLeDIM competition L-Town data.

CLJun 7, 2024
Through the Thicket: A Study of Number-Oriented LLMs derived from Random Forest Models

Michał Romaszewski, Przemysław Sekuła, Przemysław Głomb et al.

Large Language Models (LLMs) have shown exceptional performance in text processing. Notably, LLMs can synthesize information from large datasets and explain their decisions similarly to human reasoning through a chain of thought (CoT). An emerging application of LLMs is the handling and interpreting of numerical data, where fine-tuning enhances their performance over basic inference methods. This paper proposes a novel approach to training LLMs using knowledge transfer from a random forest (RF) ensemble, leveraging its efficiency and accuracy. By converting RF decision paths into natural language statements, we generate outputs for LLM fine-tuning, enhancing the model's ability to classify and explain its decisions. Our method includes verifying these rules through established classification metrics, ensuring their correctness. We also examine the impact of preprocessing techniques on the representation of numerical data and their influence on classification accuracy and rule correctness

LGNov 3, 2021
Data structure > labels? Unsupervised heuristics for SVM hyperparameter estimation

Michał Cholewa, Michał Romaszewski, Przemysław Głomb

Classification is one of the main areas of pattern recognition research, and within it, Support Vector Machine (SVM) is one of the most popular methods outside of field of deep learning -- and a de-facto reference for many Machine Learning approaches. Its performance is determined by parameter selection, which is usually achieved by a time-consuming grid search cross-validation procedure (GSCV). That method, however relies on the availability and quality of labelled examples and thus, when those are limited can be hindered. To address that problem, there exist several unsupervised heuristics that take advantage of the characteristics of the dataset for selecting parameters instead of using class label information. While an order of magnitude faster, they are scarcely used under the assumption that their results are significantly worse than those of grid search. To challenge that assumption, we have proposed improved heuristics for SVM parameter selection and tested it against GSCV and state of the art heuristics on over 30 standard classification datasets. The results show not only its advantage over state-of-art heuristics but also that it is statistically no worse than GSCV.

IVSep 28, 2021
Improving Autoencoder Training Performance for Hyperspectral Unmixing with Network Reinitialisation

Kamil Książek, Przemysław Głomb, Michał Romaszewski et al.

Neural networks, in particular autoencoders, are one of the most promising solutions for unmixing hyperspectral data, i.e. reconstructing the spectra of observed substances (endmembers) and their relative mixing fractions (abundances), which is needed for effective hyperspectral analysis and classification. However, as we show in this paper, the training of autoencoders for unmixing is highly dependent on weights initialisation; some sets of weights lead to degenerate or low-performance solutions, introducing negative bias in the expected performance. In this work, we experimentally investigate autoencoders stability as well as network reinitialisation methods based on coefficients of neurons' dead activations. We demonstrate that the proposed techniques have a positive effect on autoencoder training in terms of reconstruction, abundances and endmembers errors.

CVAug 24, 2020
A Dataset for Evaluating Blood Detection in Hyperspectral Images

Michał Romaszewski, Przemysław Głomb, Arkadiusz Sochan et al.

The sensitivity of imaging spectroscopy to haemoglobin derivatives makes it a promising tool for detecting blood. However, due to complexity and high dimensionality of hyperspectral images, the development of hyperspectral blood detection algorithms is challenging. To facilitate their development, we present a new hyperspectral blood detection dataset. This dataset, published in accordance to open access mandate, consist of multiple detection scenarios with varying levels of complexity. It allows to test the performance of Machine Learning methods in relation to different acquisition environments, types of background, age of blood and presence of other blood-like substances. We explored the dataset with blood detection experiments. We used hyperspectral target detection algorithm based on the well-known Matched Filter detector. Our results and their discussion highlight the challenges of blood detection in hyperspectral data and form a reference for further works.

CVAug 10, 2018
Band selection with Higher Order Multivariate Cumulants for small target detection in hyperspectral images

Przemysław Głomb, Krzysztof Domino, Michał Romaszewski et al.

In the small target detection problem a pattern to be located is on the order of magnitude less numerous than other patterns present in the dataset. This applies both to the case of supervised detection, where the known template is expected to match in just a few areas and unsupervised anomaly detection, as anomalies are rare by definition. This problem is frequently related to the imaging applications, i.e. detection within the scene acquired by a camera. To maximize available data about the scene, hyperspectral cameras are used; at each pixel, they record spectral data in hundreds of narrow bands. The typical feature of hyperspectral imaging is that characteristic properties of target materials are visible in the small number of bands, where light of certain wavelength interacts with characteristic molecules. A target-independent band selection method based on statistical principles is a versatile tool for solving this problem in different practical applications. Combination of a regular background and a rare standing out anomaly will produce a distortion in the joint distribution of hyperspectral pixels. Higher Order Cumulants Tensors are a natural `window' into this distribution, allowing to measure properties and suggest candidate bands for removal. While there have been attempts at producing band selection algorithms based on the 3 rd cumulant's tensor i.e. the joint skewness, the literature lacks a systematic analysis of how the order of the cumulant tensor used affects effectiveness of band selection in detection applications. In this paper we present an analysis of a general algorithm for band selection based on higher order cumulants. We discuss its usability related to the observed breaking points in performance, depending both on method order and the desired number of bands. Finally we perform experiments and evaluate these methods in a hyperspectral detection scenario.