Bartosz Wójcik

LG
h-index31
18papers
185citations
Novelty53%
AI Score44

18 Papers

CVSep 21, 2023
Face Identity-Aware Disentanglement in StyleGAN

Adrian Suwała, Bartosz Wójcik, Magdalena Proszewska et al.

Conditional GANs are frequently used for manipulating the attributes of face images, such as expression, hairstyle, pose, or age. Even though the state-of-the-art models successfully modify the requested attributes, they simultaneously modify other important characteristics of the image, such as a person's identity. In this paper, we focus on solving this problem by introducing PluGeN4Faces, a plugin to StyleGAN, which explicitly disentangles face attributes from a person's identity. Our key idea is to perform training on images retrieved from movie frames, where a given person appears in various poses and with different attributes. By applying a type of contrastive loss, we encourage the model to group images of the same person in similar regions of latent space. Our experiments demonstrate that the modifications of face attributes performed by PluGeN4Faces are significantly less invasive on the remaining characteristics of the image than in the existing state-of-the-art models.

LGJun 16, 2022
Continual Learning with Guarantees via Weight Interval Constraints

Maciej Wołczyk, Karol J. Piczak, Bartosz Wójcik et al.

We introduce a new training paradigm that enforces interval constraints on neural network parameter space to control forgetting. Contemporary Continual Learning (CL) methods focus on training neural networks efficiently from a stream of data, while reducing the negative impact of catastrophic forgetting, yet they do not provide any firm guarantees that network performance will not deteriorate uncontrollably over time. In this work, we show how to put bounds on forgetting by reformulating continual learning of a model as a continual contraction of its parameter space. To that end, we propose Hyperrectangle Training, a new training methodology where each task is represented by a hyperrectangle in the parameter space, fully contained in the hyperrectangles of the previous tasks. This formulation reduces the NP-hard CL problem back to polynomial time while providing full resilience against forgetting. We validate our claim by developing InterContiNet (Interval Continual Learning) algorithm which leverages interval arithmetic to effectively model parameter regions as hyperrectangles. Through experimental results, we show that our approach performs well in a continual learning setup without storing data from previous tasks.

CVOct 11, 2022
Computer Vision based inspection on post-earthquake with UAV synthetic dataset

Mateusz Żarski, Bartosz Wójcik, Jarosław A. Miszczak et al.

The area affected by the earthquake is vast and often difficult to entirely cover, and the earthquake itself is a sudden event that causes multiple defects simultaneously, that cannot be effectively traced using traditional, manual methods. This article presents an innovative approach to the problem of detecting damage after sudden events by using an interconnected set of deep machine learning models organized in a single pipeline and allowing for easy modification and swapping models seamlessly. Models in the pipeline were trained with a synthetic dataset and were adapted to be further evaluated and used with unmanned aerial vehicles (UAVs) in real-world conditions. Thanks to the methods presented in the article, it is possible to obtain high accuracy in detecting buildings defects, segmenting constructions into their components and estimating their technical condition based on a single drone flight.

LGJun 28, 2022
SLOVA: Uncertainty Estimation Using Single Label One-Vs-All Classifier

Bartosz Wójcik, Jacek Grela, Marek Śmieja et al.

Deep neural networks present impressive performance, yet they cannot reliably estimate their predictive confidence, limiting their applicability in high-risk domains. We show that applying a multi-label one-vs-all loss reveals classification ambiguity and reduces model overconfidence. The introduced SLOVA (Single Label One-Vs-All) model redefines typical one-vs-all predictive probabilities to a single label situation, where only one class is the correct answer. The proposed classifier is confident only if a single class has a high probability and other probabilities are negligible. Unlike the typical softmax function, SLOVA naturally detects out-of-distribution samples if the probabilities of all other classes are small. The model is additionally fine-tuned with exponential calibration, which allows us to precisely align the confidence score with model accuracy. We verify our approach on three tasks. First, we demonstrate that SLOVA is competitive with the state-of-the-art on in-distribution calibration. Second, the performance of SLOVA is robust under dataset shifts. Finally, our approach performs extremely well in the detection of out-of-distribution samples. Consequently, SLOVA is a tool that can be used in various applications where uncertainty modeling is required.

LGOct 6, 2023
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion

Filip Szatkowski, Bartosz Wójcik, Mikołaj Piórczyński et al.

Transformer models can face practical limitations due to their high computational requirements. At the same time, such models exhibit significant activation sparsity, which can be leveraged to reduce the inference cost by converting parts of the network into equivalent Mixture-of-Experts (MoE) layers. Despite the crucial role played by activation sparsity, its impact on this process remains unexplored. We demonstrate that the efficiency of the conversion can be significantly enhanced by a proper regularization of the activation sparsity of the base model. Moreover, motivated by the high variance of the number of activated neurons for different inputs, we introduce a more effective dynamic-$k$ expert selection rule that adjusts the number of executed experts on a per-token basis. To achieve further savings, we extend this approach to multi-head attention projections. Finally, we develop an efficient implementation that translates these computational savings into actual wall-clock speedup. The proposed method, Dense to Dynamic-$k$ Mixture-of-Experts (D2DMoE), outperforms existing approaches on common NLP and vision tasks, reducing inference cost by up to 60% without significantly impacting performance.

LGJul 19, 2024
How to Train Your Multi-Exit Model? Analyzing the Impact of Training Strategies

Piotr Kubaty, Bartosz Wójcik, Bartłomiej Krzepkowski et al.

Early exits enable the network's forward pass to terminate early by attaching trainable internal classifiers to the backbone network. Existing early-exit methods typically adopt either a joint training approach, where the backbone and exit heads are trained simultaneously, or a disjoint approach, where the heads are trained separately. However, the implications of this choice are often overlooked, with studies typically adopting one approach without adequate justification. This choice influences training dynamics and its impact remains largely unexplored. In this paper, we introduce a set of metrics to analyze early-exit training dynamics and guide the choice of training strategy. We demonstrate that conventionally used joint and disjoint regimes yield suboptimal performance. To address these limitations, we propose a mixed training strategy: the backbone is trained first, followed by the training of the entire multi-exit network. Through comprehensive evaluations of training strategies across various architectures, datasets, and early-exit methods, we present the strengths and weaknesses of the early exit training strategies. In particular, we show consistent improvements in performance and efficiency using the proposed mixed strategy.

LGDec 15, 2023
Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference

Bartosz Wójcik, Alessio Devoto, Karol Pustelnik et al.

While transformer models have been highly successful, they are computationally inefficient. We observe that for each layer, the full width of the layer may be needed only for a small subset of tokens inside a batch and that the "effective" width needed to process a token can vary from layer to layer. Motivated by this observation, we introduce the Adaptive Computation Module (ACM), a generic module that dynamically adapts its computational load to match the estimated difficulty of the input on a per-token basis. An ACM consists of a sequence of learners that progressively refine the output of their preceding counterparts. An additional gating mechanism determines the optimal number of learners to execute for each token. We also propose a distillation technique to replace any pre-trained model with an "ACMized" variant. Our evaluation of transformer models in computer vision and speech recognition demonstrates that substituting layers with ACMs significantly reduces inference costs without degrading the downstream accuracy for a wide interval of user-defined budgets.

LGAug 30, 2025
Universal Properties of Activation Sparsity in Modern Large Language Models

Filip Szatkowski, Patryk Będkowski, Alessio Devoto et al.

Input-dependent activation sparsity is a notable property of deep learning models, which has been extensively studied in networks with ReLU activations and is associated with efficiency, robustness, and interpretability. However, the approaches developed for ReLU-based models depend on exact zero activations and do not transfer directly to modern large language models~(LLMs), which have abandoned ReLU in favor of other activation functions. As a result, current work on activation sparsity in LLMs is fragmented, model-specific, and lacks consensus on which components to target. We propose a general framework to assess sparsity robustness and present a systematic study of the phenomenon in the FFN layers of modern LLMs, including diffusion LLMs. Our findings reveal universal patterns of activation sparsity in LLMs, provide insights into this phenomenon, and offer practical guidelines for exploiting it in model design and acceleration.

LGAug 29, 2025
Failure Prediction Is a Better Performance Proxy for Early-Exit Networks Than Calibration

Piotr Kubaty, Filip Szatkowski, Metod Jazbec et al.

Early-exit models accelerate inference by attaching internal classifiers to intermediate layers of the network, allowing computation to halt once a prediction meets a predefined exit criterion. Most early-exit methods rely on confidence-based exit strategies, which has motivated prior work to calibrate intermediate classifiers in pursuit of improved performance-efficiency trade-offs. In this paper, we argue that calibration metrics can be misleading indicators of multi-exit model performance. Specifically, we present empirical evidence showing that miscalibrated networks can outperform calibrated ones. As an alternative, we propose using failure prediction as a more informative proxy for early-exit model performance. Unlike calibration, failure prediction captures changes in sample rankings and correlates strongly with efficiency gains, offering a more reliable framework for designing and evaluating early-exit models.

LGAug 26, 2025
Efficient Multi-Source Knowledge Transfer by Model Merging

Marcin Osial, Bartosz Wójcik, Bartosz Zieliński et al.

While transfer learning is an advantageous strategy, it overlooks the opportunity to leverage knowledge from numerous available models online. Addressing this multi-source transfer learning problem is a promising path to boost adaptability and cut re-training costs. However, existing approaches are inherently coarse-grained, lacking the necessary precision for granular knowledge extraction and the aggregation efficiency required to fuse knowledge from either a large number of source models or those with high parameter counts. We address these limitations by leveraging Singular Value Decomposition (SVD) to first decompose each source model into its elementary, rank-one components. A subsequent aggregation stage then selects only the most salient components from all sources, thereby overcoming the previous efficiency and precision limitations. To best preserve and leverage the synthesized knowledge base, our method adapts to the target task by fine-tuning only the principal singular values of the merged matrix. In essence, this process only recalibrates the importance of top SVD components. The proposed framework allows for efficient transfer learning, is robust to perturbations both at the input level and in the parameter space (e.g., noisy or pruned sources), and scales well computationally.

LGFeb 22, 2025
Do LLMs Understand the Safety of Their Inputs? Training-Free Moderation via Latent Prototypes

Maciej Chrabąszcz, Filip Szatkowski, Bartosz Wójcik et al.

With the rise of LLMs, ensuring model safety and alignment has become a critical concern. While modern instruction-finetuned LLMs incorporate alignment during training, they still frequently require moderation tools to prevent unsafe behavior. The most common approach to moderation are guard models that flag unsafe inputs. However, guards require costly training and are typically limited to fixed-size, pre-trained options, making them difficult to adapt to evolving risks and resource constraints. We hypothesize that instruction-finetuned LLMs already encode safety-relevant information internally and explore training-free safety assessment methods that work with off-the-shelf models. We show that simple prompting allows models to recognize harmful inputs they would otherwise mishandle. We also demonstrate that safe and unsafe prompts are distinctly separable in the models' latent space. Building on this, we introduce the Latent Prototype Moderator (LPM), a training-free moderation method that uses Mahalanobis distance in latent space to assess input safety. LPM is a lightweight, customizable add-on that generalizes across model families and sizes. Our method matches or exceeds state-of-the-art guard models across multiple safety benchmarks, offering a practical and flexible solution for scalable LLM moderation.

CVJun 21, 2021
Hard hat wearing detection based on head keypoint localization

Bartosz Wójcik, Mateusz Żarski, Kamil Książek et al.

In recent years, a lot of attention is paid to deep learning methods in the context of vision-based construction site safety systems, especially regarding personal protective equipment. However, despite all this attention, there is still no reliable way to establish the relationship between workers and their hard hats. To answer this problem a combination of deep learning, object detection and head keypoint localization, with simple rule-based reasoning is proposed in this article. In tests, this solution surpassed the previous methods based on the relative bounding box position of different instances, as well as direct detection of hard hat wearers and non-wearers. The results show that the conjunction of novel deep learning methods with humanly-interpretable rule-based systems can result in a solution that is both reliable and can successfully mimic manual, on-site supervision. This work is the next step in the development of fully autonomous construction site safety systems and shows that there is still room for improvement in this area.

LGJun 9, 2021
Zero Time Waste: Recycling Predictions in Early Exit Neural Networks

Maciej Wołczyk, Bartosz Wójcik, Klaudia Bałazy et al.

The problem of reducing processing time of large deep learning models is a fundamental challenge in many real-world applications. Early exit methods strive towards this goal by attaching additional Internal Classifiers (ICs) to intermediate layers of a neural network. ICs can quickly return predictions for easy examples and, as a result, reduce the average inference time of the whole model. However, if a particular IC does not decide to return an answer early, its predictions are discarded, with its computations effectively being wasted. To solve this issue, we introduce Zero Time Waste (ZTW), a novel approach in which each IC reuses predictions returned by its predecessors by (1) adding direct connections between ICs and (2) combining previous outputs in an ensemble-like manner. We conduct extensive experiments across various datasets and architectures to demonstrate that ZTW achieves a significantly better accuracy vs. inference time trade-off than other recently proposed early exit methods.

LGJun 17, 2020
Adversarial Examples Detection and Analysis with Layer-wise Autoencoders

Bartosz Wójcik, Paweł Morawiecki, Marek Śmieja et al.

We present a mechanism for detecting adversarial examples based on data representations taken from the hidden layers of the target network. For this purpose, we train individual autoencoders at intermediate layers of the target network. This allows us to describe the manifold of true data and, in consequence, decide whether a given example has the same characteristics as true data. It also gives us insight into the behavior of adversarial examples and their flow through the layers of a deep neural network. Experimental results show that our method outperforms the state of the art in supervised and unsupervised settings.

CVApr 26, 2020
KrakN: Transfer Learning framework for thin crack detection in infrastructure maintenance

Mateusz Żarski, Bartosz Wójcik, Jarosław Adam Miszczak

Monitoring the technical condition of infrastructure is a crucial element to its maintenance. Currently applied methods are outdated, labour-intensive and inaccurate. At the same time, the latest methods using Artificial Intelligence techniques are severely limited in their application due to two main factors -- labour-intensive gathering of new datasets and high demand for computing power. We propose to utilize custom made framework -- KrakN, to overcome these limiting factors. It enables the development of unique infrastructure defects detectors on digital images, achieving the accuracy of above 90%. The framework supports semi-automatic creation of new datasets and has modest computing power requirements. It is implemented in the form of a ready-to-use software package openly distributed to the public. Thus, it can be used to immediately implement the methods proposed in this paper in the process of infrastructure management by government units, regardless of their financial capabilities.

LGApr 17, 2020
Finding the Optimal Network Depth in Classification Tasks

Bartosz Wójcik, Maciej Wołczyk, Klaudia Bałazy et al.

We develop a fast end-to-end method for training lightweight neural networks using multiple classifier heads. By allowing the model to determine the importance of each head and rewarding the choice of a single shallow classifier, we are able to detect and remove unneeded components of the network. This operation, which can be seen as finding the optimal depth of the model, significantly reduces the number of parameters and accelerates inference across different hardware processing units, which is not the case for many standard pruning methods. We show the performance of our method on multiple network architectures and datasets, analyze its optimization properties, and conduct ablation studies.

LGMay 30, 2019
One-element Batch Training by Moving Window

Przemysław Spurek, Szymon Knop, Jacek Tabor et al.

Several deep models, esp. the generative, compare the samples from two distributions (e.g. WAE like AutoEncoder models, set-processing deep networks, etc) in their cost functions. Using all these methods one cannot train the model directly taking small size (in extreme -- one element) batches, due to the fact that samples are to be compared. We propose a generic approach to training such models using one-element mini-batches. The idea is based on splitting the batch in latent into parts: previous, i.e. historical, elements used for latent space distribution matching and the current ones, used both for latent distribution computation and the minimization process. Due to the smaller memory requirements, this allows to train networks on higher resolution images then in the classical approach.

LGFeb 20, 2019
LOSSGRAD: automatic learning rate in gradient descent

Bartosz Wójcik, Łukasz Maziarka, Jacek Tabor

In this paper, we propose a simple, fast and easy to implement algorithm LOSSGRAD (locally optimal step-size in gradient descent), which automatically modifies the step-size in gradient descent during neural networks training. Given a function $f$, a point $x$, and the gradient $\nabla_x f$ of $f$, we aim to find the step-size $h$ which is (locally) optimal, i.e. satisfies: $$ h=arg\,min_{t \geq 0} f(x-t \nabla_x f). $$ Making use of quadratic approximation, we show that the algorithm satisfies the above assumption. We experimentally show that our method is insensitive to the choice of initial learning rate while achieving results comparable to other methods.