Dominik Żurek

LG
h-index20
11papers
70citations
Novelty45%
AI Score48

11 Papers

23.1ROMay 8
Goal-Conditioned Decision Transformer for Multi-Goal Offline Reinforcement Learning

Paweł Gajewski, Dominik Żurek, Marcin Pietroń et al.

Reinforcement learning (RL) in robotics faces significant hurdles regarding sample efficiency and generalization across varying goals. While Offline RL mitigates the need for costly online interactions, its integration with goal-conditioned policies and transformer-based architectures remains underexplored. We introduce a Goal-Conditioned Decision Transformer adapted for offline multi-goal robotics. By explicitly incorporating goal states into the sequence modeling framework, our approach efficiently solves varying tasks using only pre-collected data. We validate this method on a newly released offline dataset for the Franka Emika Panda platform. Experimental results demonstrate that our approach outperforms state-of-the-art online baselines in complex tasks and maintains robustness in sparse-reward settings, even with limited expert demonstrations.

LGAug 14, 2023
Ada-QPacknet -- adaptive pruning with bit width reduction as an efficient continual learning method without forgetting

Marcin Pietroń, Dominik Żurek, Kamil Faber et al.

Continual Learning (CL) is a process in which there is still huge gap between human and deep learning model efficiency. Recently, many CL algorithms were designed. Most of them have many problems with learning in dynamic and complex environments. In this work new architecture based approach Ada-QPacknet is described. It incorporates the pruning for extracting the sub-network for each task. The crucial aspect in architecture based CL methods is theirs capacity. In presented method the size of the model is reduced by efficient linear and nonlinear quantisation approach. The method reduces the bit-width of the weights format. The presented results shows that low bit quantisation achieves similar accuracy as floating-point sub-network on a well-know CL scenarios. To our knowledge it is the first CL strategy which incorporates both compression techniques pruning and quantisation for generating task sub-networks. The presented algorithm was tested on well-known episode combinations and compared with most popular algorithms. Results show that proposed approach outperforms most of the CL strategies in task and class incremental scenarios.

44.3LGApr 28Code
TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning

Dominik Żurek, Kamil Faber, Marcin Pietron et al.

Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously learned tasks. This setting corresponds to domains where new tasks arise over time, but adapting the model in live environment interactions is expensive, risky, or impossible. However, CORL inherits the dual difficulty of offline reinforcement learning and adapting while preventing catastrophic forgetting. Replay-based continual learning approaches remain a strong baseline but incur memory overhead and suffer from a distribution mismatch between replayed samples and newly learned policies. At the same time, architectural continual learning methods have shown strong potential in supervised learning but remain underexplored in CORL. In this work, we propose TSN-Affinity, a novel CORL method based on TinySubNetworks and Decision Transformer. The method enables task-specific parameterization and controlled knowledge sharing through a RL-aware reuse strategy that routes tasks according to action compatibility and latent similarity. We evaluate the approach on benchmarks based on Atari games and simulations of manipulation tasks with the Franka Emika Panda robotic arm, covering both discrete and continuous control. Results show strong retention from sparse SubNetworks, with routing further improving multi-task performance. Our findings suggest that similarity-guided architectural reuse is a strong and viable alternative to replay-based strategies in a CORL setting. Our code is available at: https://github.com/anonymized-for-submission123/tsn-affinity.

LGJun 28, 2025Code
xLSTMAD: A Powerful xLSTM-based Method for Anomaly Detection

Kamil Faber, Marcin Pietroń, Dominik Żurek et al.

The recently proposed xLSTM is a powerful model that leverages expressive multiplicative gating and residual connections, providing the temporal capacity needed for long-horizon forecasting and representation learning. This architecture has demonstrated success in time series forecasting, lossless compression, and even large-scale language modeling tasks, where its linear memory footprint and fast inference make it a viable alternative to Transformers. Despite its growing popularity, no prior work has explored xLSTM for anomaly detection. In this work, we fill this gap by proposing xLSTMAD, the first anomaly detection method that integrates a full encoder-decoder xLSTM architecture, purpose-built for multivariate time series data. Our encoder processes input sequences to capture historical context, while the decoder is devised in two separate variants of the method. In the forecasting approach, the decoder iteratively generates forecasted future values xLSTMAD-F, while the reconstruction approach reconstructs the input time series from its encoded counterpart xLSTMAD-R. We investigate the performance of two loss functions: Mean Squared Error (MSE), and Soft Dynamic Time Warping (SoftDTW) to consider local reconstruction fidelity and global sequence alignment, respectively. We evaluate our method on the comprehensive TSB-AD-M benchmark, which spans 17 real-world datasets, using state-of-the-art challenging metrics such as VUS-PR. In our results, xLSTM showcases state-of-the-art accuracy, outperforming 23 popular anomaly detection baselines. Our paper is the first work revealing the powerful modeling capabilities of xLSTM for anomaly detection, paving the way for exciting new developments on this subject. Our code is available at: https://github.com/Nyderx/xlstmad

LGDec 14, 2024Code
TinySubNets: An efficient and low capacity continual learning strategy

Marcin Pietroń, Kamil Faber, Dominik Żurek et al.

Continual Learning (CL) is a highly relevant setting gaining traction in recent machine learning research. Among CL works, architectural and hybrid strategies are particularly effective due to their potential to adapt the model architecture as new tasks are presented. However, many existing solutions do not efficiently exploit model sparsity, and are prone to capacity saturation due to their inefficient use of available weights, which limits the number of learnable tasks. In this paper, we propose TinySubNets (TSN), a novel architectural CL strategy that addresses the issues through the unique combination of pruning with different sparsity levels, adaptive quantization, and weight sharing. Pruning identifies a subset of weights that preserve model performance, making less relevant weights available for future tasks. Adaptive quantization allows a single weight to be separated into multiple parts which can be assigned to different tasks. Weight sharing between tasks boosts the exploitation of capacity and task similarity, allowing for the identification of a better trade-off between model accuracy and capacity. These features allow TSN to efficiently leverage the available capacity, enhance knowledge transfer, and reduce computational resource consumption. Experimental results involving common benchmark CL datasets and scenarios show that our proposed strategy achieves better results in terms of accuracy than existing state-of-the-art CL strategies. Moreover, our strategy is shown to provide a significantly improved model capacity exploitation. Code released at: https://github.com/lifelonglab/tinysubnets.

NEMar 25, 2024
AD-NEv++ : The multi-architecture neuroevolution-based multivariate anomaly detection framework

Marcin Pietroń, Dominik Żurek, Kamil Faber et al.

Anomaly detection tools and methods enable key analytical capabilities in modern cyberphysical and sensor-based systems. Despite the fast-paced development in deep learning architectures for anomaly detection, model optimization for a given dataset is a cumbersome and time-consuming process. Neuroevolution could be an effective and efficient solution to this problem, as a fully automated search method for learning optimal neural networks, supporting both gradient and non-gradient fine tuning. However, existing frameworks incorporating neuroevolution lack of support for new layers and architectures and are typically limited to convolutional and LSTM layers. In this paper we propose AD-NEv++, a three-stage neuroevolution-based method that synergically combines subspace evolution, model evolution, and fine-tuning. Our method overcomes the limitations of existing approaches by optimizing the mutation operator in the neuroevolution process, while supporting a wide spectrum of neural layers, including attention, dense, and graph convolutional layers. Our extensive experimental evaluation was conducted with widely adopted multivariate anomaly detection benchmark datasets, and showed that the models generated by AD-NEv++ outperform well-known deep learning architectures and neuroevolution-based approaches for anomaly detection. Moreover, results show that AD-NEv++ can improve and outperform the state-of-the-art GNN (Graph Neural Networks) model architecture in all anomaly detection benchmarks.

LGMar 4, 2024
Towards efficient deep autoencoders for multivariate time series anomaly detection

Marcin Pietroń, Dominik Żurek, Kamil Faber et al.

Multivariate time series anomaly detection is a crucial problem in many industrial and research applications. Timely detection of anomalies allows, for instance, to prevent defects in manufacturing processes and failures in cyberphysical systems. Deep learning methods are preferred among others for their accuracy and robustness for the analysis of complex multivariate data. However, a key aspect is being able to extract predictions in a timely manner, to accommodate real-time requirements in different applications. In the case of deep learning models, model reduction is extremely important to achieve optimal results in real-time systems with limited time and memory constraints. In this paper, we address this issue by proposing a novel compression method for deep autoencoders that involves three key factors. First, pruning reduces the number of weights, while preventing catastrophic drops in accuracy by means of a fast search process that identifies high sparsity levels. Second, linear and non-linear quantization reduces model complexity by reducing the number of bits for every single weight. The combined contribution of these three aspects allow the model size to be reduced, by removing a subset of the weights (pruning), and decreasing their bit-width (quantization). As a result, the compressed model is faster and easier to adopt in highly constrained hardware environments. Experiments performed on popular multivariate anomaly detection benchmarks, show that our method is capable of achieving significant model compression ratio (between 80% and 95%) without a significant reduction in the anomaly detection performance.

LGDec 28, 2021
Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

Marcin Pietroń, Dominik Żurek

This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the most effective implementations of deep learning (DL) algorithms for GPUs. GPUs are the most commonly used accelerators for deep learning computations. One of the most common techniques for improving the efficiency of CNN models is weight pruning and quantization. There are two main types of pruning: structural and non-structural. The first enables much easier acceleration on many type of accelerators, but with this type it is difficult to achieve a sparsity level and accuracy as high as that obtained with the second type. Non-structural pruning with retraining can generate a weight tensors up to 90% or more of sparsity in some deep CNN models. In this article the pruning algorithm is presented which makes it possible to achieve high sparsity levels without accuracy drop. In the next stage the linear and non-linear quantization is adapted for further time and footprint reduction. This paper is an extended of previously published paper concerning effective pruning techniques and present real models pruned with high sparsities and reduced precision which can achieve better performance than the CuDnn library.

LGAug 8, 2021
Ensemble neuroevolution based approach for multivariate time series anomaly detection

Kamil Faber, Dominik Żurek, Marcin Pietroń et al.

Multivariate time series anomaly detection is a very common problem in the field of failure prevention. Fast prevention means lower repair costs and losses. The amount of sensors in novel industry systems makes the anomaly detection process quite difficult for humans. Algorithms which automates the process of detecting anomalies are crucial in modern failure-prevention systems. Therefore, many machine and deep learning models have been designed to address this problem. Mostly, they are autoencoder-based architectures with some generative adversarial elements. In this work, a framework is shown which incorporates neuroevolution methods to boost the anomaly-detection scores of new and already known models. The presented approach adapts evolution strategies for evolving ensemble model, in which every single model works on a subgroup of data sensors. The next goal of neuroevolution is to optimise architecture and hyperparameters like window size, the number of layers, layer depths, etc. The proposed framework shows that it is possible to boost most of the anomaly detection deep learning models in a reasonable time and a fully automated mode. The tests were run on SWAT and WADI datasets. To our knowledge, this is the first approach in which an ensemble deep learning anomaly detection model is built in a fully automatic way using a neuroevolution strategy.

LGNov 12, 2020
When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

Marcin Pietroń, Dominik Żurek

This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation of deep learning (DL) algorithms for GPUs. GPUs are one of the most efficient and commonly used accelerators for deep learning computations. The modern CNN models need megabytes of coefficients and needed millions MAC operations to perform convolution. One of the most common techniques for compressing CNN models is weight pruning. There are two main types of pruning: structural (based on removing whole weight channels) and non-structural (removing individual weights). The first enables much easier acceleration, but with this type it is difficult to achieve a sparsity level and accuracy as high as that obtained with the second type. Non-structural pruning with retraining can generate a matrix-weight up to $\sim90\%$ or more of sparsity in some deep CNN models. This work shows when is worth using a direct sparse operation to speed-up the calculation of the convolution layers. The VGG-16, CNN-non-static and 1x1 layers from ResNet models were used as a benchmarks. In addition, we present the impact of using reduced precision on time efficiency.

LGJul 17, 2020
Training with reduced precision of a support vector machine model for text classification

Dominik Żurek, Marcin Pietroń

This paper presents the impact of using quantization on the efficiency of multi-class text classification in the training process of a support vector machine (SVM). This work is focused on comparing the efficiency of SVM model trained using reduced precision with its original form. The main advantage of using quantization is decrease in computation time and in memory footprint on the dedicated hardware platform which supports low precision computation like GPU (16-bit) or FPGA (any bit-width). The paper presents the impact of a precision reduction of the SVM training process on text classification accuracy. The implementation of the CPU was performed using the OpenMP library. Additionally, the results of the implementation of the GPU using double, single and half precision are presented.