LGJun 21, 2023Code
Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse TrainingAleksandra I. Nowak, Bram Grooten, Decebal Constantin Mocanu et al.
Dynamic Sparse Training (DST) is a rapidly evolving area of research that seeks to optimize the sparse initialization of a neural network by adapting its topology during training. It has been shown that under specific conditions, DST is able to outperform dense models. The key components of this framework are the pruning and growing criteria, which are repeatedly applied during the training process to adjust the network's sparse connectivity. While the growing criterion's impact on DST performance is relatively well studied, the influence of the pruning criterion remains overlooked. To address this issue, we design and perform an extensive empirical analysis of various pruning criteria to better understand their impact on the dynamics of DST solutions. Surprisingly, we find that most of the studied methods yield similar results. The differences become more significant in the low-density regime, where the best performance is predominantly given by the simplest technique: magnitude-based pruning. The code is provided at https://github.com/alooow/fantastic_weights_paper
IVJun 18, 2023Code
ProMIL: Probabilistic Multiple Instance Learning for Medical ImagingŁukasz Struski, Dawid Rymarczyk, Arkadiusz Lewicki et al.
Multiple Instance Learning (MIL) is a weakly-supervised problem in which one label is assigned to the whole bag of instances. An important class of MIL models is instance-based, where we first classify instances and then aggregate those predictions to obtain a bag label. The most common MIL model is when we consider a bag as positive if at least one of its instances has a positive label. However, this reasoning does not hold in many real-life scenarios, where the positive bag label is often a consequence of a certain percentage of positive instances. To address this issue, we introduce a dedicated instance-based method called ProMIL, based on deep neural networks and Bernstein polynomial estimation. An important advantage of ProMIL is that it can automatically detect the optimal percentage level for decision-making. We show that ProMIL outperforms standard instance-based MIL in real-world medical applications. We make the code available.
MLJun 29, 2022
LIDL: Local Intrinsic Dimension Estimation Using Approximate LikelihoodPiotr Tempczyk, Rafał Michaluk, Łukasz Garncarek et al. · apple-ml
Most of the existing methods for estimating the local intrinsic dimension of a data distribution do not scale well to high-dimensional data. Many of them rely on a non-parametric nearest neighbors approach which suffers from the curse of dimensionality. We attempt to address that challenge by proposing a novel approach to the problem: Local Intrinsic Dimension estimation using approximate Likelihood (LIDL). Our method relies on an arbitrary density estimation method as its subroutine and hence tries to sidestep the dimensionality challenge by making use of the recent progress in parametric neural methods for likelihood estimation. We carefully investigate the empirical properties of the proposed method, compare them with our theoretical predictions, and show that LIDL yields competitive results on the standard benchmarks for this problem and that it scales to thousands of dimensions. What is more, we anticipate this approach to improve further with the continuing advances in the density estimation literature.
CVJan 28, 2023
ProtoSeg: Interpretable Semantic Segmentation with Prototypical PartsMikołaj Sacha, Dawid Rymarczyk, Łukasz Struski et al.
We introduce ProtoSeg, a novel model for interpretable semantic image segmentation, which constructs its predictions using similar patches from the training set. To achieve accuracy comparable to baseline methods, we adapt the mechanism of prototypical parts and introduce a diversity loss function that increases the variety of prototypes within each class. We show that ProtoSeg discovers semantic concepts, in contrast to standard segmentation models. Experiments conducted on Pascal VOC and Cityscapes datasets confirm the precision and transparency of the presented method.
LGMar 21, 2022
HyperShot: Few-Shot Learning by Kernel HyperNetworksMarcin Sendera, Marcin Przewięźlikowski, Konrad Karanowski et al.
Few-shot models aim at making predictions using a minimal number of labeled examples from a given task. The main challenge in this area is the one-shot setting where only one element represents each class. We propose HyperShot - the fusion of kernels and hypernetwork paradigm. Compared to reference approaches that apply a gradient-based adjustment of the parameters, our model aims to switch the classification module parameters depending on the task's embedding. In practice, we utilize a hypernetwork, which takes the aggregated information from support data and returns the classifier's parameters handcrafted for the considered problem. Moreover, we introduce the kernel-based representation of the support examples delivered to hypernetwork to create the parameters of the classification module. Consequently, we rely on relations between embeddings of the support examples instead of direct feature values provided by the backbone models. Thanks to this approach, our model can adapt to highly different tasks.
CVFeb 6
DAVE: Distribution-aware Attribution via ViT Gradient DecompositionAdam Wróbel, Siddhartha Gairola, Jacek Tabor et al.
Vision Transformers (ViTs) have become a dominant architecture in computer vision, yet producing stable and high-resolution attribution maps for these models remains challenging. Architectural components such as patch embeddings and attention routing often introduce structured artifacts in pixel-level explanations, causing many existing methods to rely on coarse patch-level attributions. We introduce DAVE \textit{(\underline{D}istribution-aware \underline{A}ttribution via \underline{V}iT Gradient D\underline{E}composition)}, a mathematically grounded attribution method for ViTs based on a structured decomposition of the input gradient. By exploiting architectural properties of ViTs, DAVE isolates locally equivariant and stable components of the effective input--output mapping. It separates these from architecture-induced artifacts and other sources of instability.
CVAug 16, 2023
Interpretability Benchmark for Evaluating Spatial Misalignment of Prototypical Parts ExplanationsMikołaj Sacha, Bartosz Jura, Dawid Rymarczyk et al.
Prototypical parts-based networks are becoming increasingly popular due to their faithful self-explanations. However, their similarity maps are calculated in the penultimate network layer. Therefore, the receptive field of the prototype activation region often depends on parts of the image outside this region, which can lead to misleading interpretations. We name this undesired behavior a spatial explanation misalignment and introduce an interpretability benchmark with a set of dedicated metrics for quantifying this phenomenon. In addition, we propose a method for misalignment compensation and apply it to existing state-of-the-art models. We show the expressiveness of our benchmark and the effectiveness of the proposed compensation methodology through extensive empirical studies.
SDNov 3, 2022
HyperSound: Generating Implicit Neural Representations of Audio Signals with HypernetworksFilip Szatkowski, Karol J. Piczak, Przemysław Spurek et al.
Implicit neural representations (INRs) are a rapidly growing research field, which provides alternative ways to represent multimedia signals. Recent applications of INRs include image super-resolution, compression of high-dimensional signals, or 3D rendering. However, these solutions usually focus on visual data, and adapting them to the audio domain is not trivial. Moreover, it requires a separately trained model for every data sample. To address this limitation, we propose HyperSound, a meta-learning method leveraging hypernetworks to produce INRs for audio signals unseen at training time. We show that our approach can reconstruct sound waves with quality comparable to other state-of-the-art models.
CLFeb 8, 2023
Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language ModelsMohammadreza Banaei, Klaudia Bałazy, Artur Kasymov et al.
Recent transformer language models achieve outstanding results in many natural language processing (NLP) tasks. However, their enormous size often makes them impractical on memory-constrained devices, requiring practitioners to compress them to smaller networks. In this paper, we explore offline compression methods, meaning computationally-cheap approaches that do not require further fine-tuning of the compressed model. We challenge the classical matrix factorization methods by proposing a novel, better-performing autoencoder-based framework. We perform a comprehensive ablation study of our approach, examining its different aspects over a diverse set of evaluation settings. Moreover, we show that enabling collaboration between modules across layers by compressing certain modules together positively impacts the final model performance. Experiments on various NLP tasks demonstrate that our approach significantly outperforms commonly used factorization-based offline compression methods.
LGFeb 9, 2023
Hypernetworks build Implicit Neural Representations of SoundsFilip Szatkowski, Karol J. Piczak, Przemysław Spurek et al.
Implicit Neural Representations (INRs) are nowadays used to represent multimedia signals across various real-life applications, including image super-resolution, image compression, or 3D rendering. Existing methods that leverage INRs are predominantly focused on visual data, as their application to other modalities, such as audio, is nontrivial due to the inductive biases present in architectural attributes of image-based INR models. To address this limitation, we introduce HyperSound, the first meta-learning approach to produce INRs for audio samples that leverages hypernetworks to generalize beyond samples observed in training. Our approach reconstructs audio samples with quality comparable to other state-of-the-art models and provides a viable alternative to contemporary sound representations used in deep neural networks for audio processing, such as spectrograms.
CVSep 21, 2023
Face Identity-Aware Disentanglement in StyleGANAdrian Suwała, Bartosz Wójcik, Magdalena Proszewska et al.
Conditional GANs are frequently used for manipulating the attributes of face images, such as expression, hairstyle, pose, or age. Even though the state-of-the-art models successfully modify the requested attributes, they simultaneously modify other important characteristics of the image, such as a person's identity. In this paper, we focus on solving this problem by introducing PluGeN4Faces, a plugin to StyleGAN, which explicitly disentangles face attributes from a person's identity. Our key idea is to perform training on images retrieved from movie frames, where a given person appears in various poses and with different attributes. By applying a type of contrastive loss, we encourage the model to group images of the same person in similar regions of latent space. Our experiments demonstrate that the modifications of face attributes performed by PluGeN4Faces are significantly less invasive on the remaining characteristics of the image than in the existing state-of-the-art models.
LGJun 16, 2022
Continual Learning with Guarantees via Weight Interval ConstraintsMaciej Wołczyk, Karol J. Piczak, Bartosz Wójcik et al.
We introduce a new training paradigm that enforces interval constraints on neural network parameter space to control forgetting. Contemporary Continual Learning (CL) methods focus on training neural networks efficiently from a stream of data, while reducing the negative impact of catastrophic forgetting, yet they do not provide any firm guarantees that network performance will not deteriorate uncontrollably over time. In this work, we show how to put bounds on forgetting by reformulating continual learning of a model as a continual contraction of its parameter space. To that end, we propose Hyperrectangle Training, a new training methodology where each task is represented by a hyperrectangle in the parameter space, fully contained in the hyperrectangles of the previous tasks. This formulation reduces the NP-hard CL problem back to polynomial time while providing full resilience against forgetting. We validate our claim by developing InterContiNet (Interval Continual Learning) algorithm which leverages interval arithmetic to effectively model parameter regions as hyperrectangles. Through experimental results, we show that our approach performs well in a continual learning setup without storing data from previous tasks.
LGMar 3, 2023
Contrastive Hierarchical ClusteringMichał Znaleźniak, Przemysław Rola, Patryk Kaszuba et al.
Deep clustering has been dominated by flat models, which split a dataset into a predefined number of groups. Although recent methods achieve an extremely high similarity with the ground truth on popular benchmarks, the information contained in the flat partition is limited. In this paper, we introduce CoHiClust, a Contrastive Hierarchical Clustering model based on deep neural networks, which can be applied to typical image data. By employing a self-supervised learning approach, CoHiClust distills the base network into a binary tree without access to any labeled data. The hierarchical clustering structure can be used to analyze the relationship between clusters, as well as to measure the similarity between data points. Experiments demonstrate that CoHiClust generates a reasonable structure of clusters, which is consistent with our intuition and image semantics. Moreover, it obtains superior clustering accuracy on most of the image datasets compared to the state-of-the-art flat clustering models.
LGJul 5, 2023
ChiENN: Embracing Molecular Chirality with Graph Neural NetworksPiotr Gaiński, Michał Koziarski, Jacek Tabor et al.
Graph Neural Networks (GNNs) play a fundamental role in many deep learning problems, in particular in cheminformatics. However, typical GNNs cannot capture the concept of chirality, which means they do not distinguish between the 3D graph of a chemical compound and its mirror image (enantiomer). The ability to distinguish between enantiomers is important especially in drug discovery because enantiomers can have very distinct biochemical properties. In this paper, we propose a theoretically justified message-passing scheme, which makes GNNs sensitive to the order of node neighbors. We apply that general concept in the context of molecular chirality to construct Chiral Edge Neural Network (ChiENN) layer which can be appended to any GNN model to enable chirality-awareness. Our experiments show that adding ChiENN layers to a GNN outperforms current state-of-the-art methods in chiral-sensitive molecular property prediction tasks.
LGJun 19, 2022
Bounding Evidence and Estimating Log-Likelihood in VAEŁukasz Struski, Marcin Mazur, Paweł Batorski et al.
Many crucial problems in deep learning and statistical inference are caused by a variational gap, i.e., a difference between model evidence (log-likelihood) and evidence lower bound (ELBO). In particular, in a classical VAE setting that involves training via an ELBO cost function, it is difficult to provide a robust comparison of the effects of training between models, since we do not know a log-likelihood of data (but only its lower bound). In this paper, to deal with this problem, we introduce a general and effective upper bound, which allows us to efficiently approximate the evidence of data. We provide extensive theoretical and experimental studies of our approach, including its comparison to the other state-of-the-art upper bounds, as well as its application as a tool for the evaluation of models that were trained on various lower bounds.
LGApr 11, 2023
r-softmax: Generalized Softmax with Controllable Sparsity RateKlaudia Bałazy, Łukasz Struski, Marek Śmieja et al.
Nowadays artificial neural network models achieve remarkable results in many disciplines. Functions mapping the representation provided by the model to the probability distribution are the inseparable aspect of deep learning solutions. Although softmax is a commonly accepted probability mapping function in the machine learning community, it cannot return sparse outputs and always spreads the positive probability to all positions. In this paper, we propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate. In contrast to the existing sparse probability mapping functions, we provide an intuitive mechanism for controlling the output sparsity level. We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax. We also apply r-softmax to the self-attention module of a pre-trained transformer language model and demonstrate that it leads to improved performance when fine-tuning the model on different natural language processing tasks.
CVMar 10, 2023
NeRFlame: FLAME-based conditioning of NeRF for 3D face renderingWojciech Zając, Joanna Waczyńska, Piotr Borycki et al.
Traditional 3D face models are based on mesh representations with texture. One of the most important models is FLAME (Faces Learned with an Articulated Model and Expressions), which produces meshes of human faces that are fully controllable. Unfortunately, such models have problems with capturing geometric and appearance details. In contrast to mesh representation, the neural radiance field (NeRF) produces extremely sharp renders. However, implicit methods are hard to animate and do not generalize well to unseen expressions. It is not trivial to effectively control NeRF models to obtain face manipulation. The present paper proposes a novel approach, named NeRFlame, which combines the strengths of both NeRF and FLAME methods. Our method enables high-quality rendering capabilities of NeRF while also offering complete control over the visual appearance, similar to FLAME. In contrast to traditional NeRF-based structures that use neural networks for RGB color and volume density modeling, our approach utilizes the FLAME mesh as a distinct density volume. Consequently, color values exist only in the vicinity of the FLAME mesh. This FLAME framework is seamlessly incorporated into the NeRF architecture for predicting RGB colors, enabling our model to explicitly represent volume density and implicitly capture RGB colors.
LGJun 28, 2022
SLOVA: Uncertainty Estimation Using Single Label One-Vs-All ClassifierBartosz Wójcik, Jacek Grela, Marek Śmieja et al.
Deep neural networks present impressive performance, yet they cannot reliably estimate their predictive confidence, limiting their applicability in high-risk domains. We show that applying a multi-label one-vs-all loss reveals classification ambiguity and reduces model overconfidence. The introduced SLOVA (Single Label One-Vs-All) model redefines typical one-vs-all predictive probabilities to a single label situation, where only one class is the correct answer. The proposed classifier is confident only if a single class has a high probability and other probabilities are negligible. Unlike the typical softmax function, SLOVA naturally detects out-of-distribution samples if the probabilities of all other classes are small. The model is additionally fine-tuned with exponential calibration, which allows us to precisely align the confidence score with model accuracy. We verify our approach on three tasks. First, we demonstrate that SLOVA is competitive with the state-of-the-art on in-distribution calibration. Second, the performance of SLOVA is robust under dataset shifts. Finally, our approach performs extremely well in the detection of out-of-distribution samples. Consequently, SLOVA is a tool that can be used in various applications where uncertainty modeling is required.
35.5LGMay 12Code
Stop Marginalizing My Dreams: Model Inversion via Laplace Kernel for Continual LearningPatryk Krukowski, Jacek Tabor, Przemysław Spurek et al.
Data-free continual learning (DFCIL) relies on model inversion to synthesize pseudo-samples and mitigate catastrophic forgetting. However, existing inversion methods are fundamentally limited by a simplifying assumption: they model feature distributions using diagonal covariance, effectively ignoring correlations that define the geometry of learned representations. As a result, synthesized samples often lack fidelity, limiting knowledge retention. In this work, we show that modeling feature dependencies is a key ingredient for effective DFCIL. We introduce REMIX, a structured covariance modeling framework that enables scalable full-covariance modeling without the prohibitive cost of dense matrix inversion and log-determinant computation. By leveraging a Laplace kernel parameterization, REMIX captures structured feature dependencies using memory that scales linearly with the feature dimensionality, while requiring only an additional logarithmic factor in computation. Modeling these correlations produces more coherent synthetic samples and consistently improves performance across standard DFCIL benchmarks. Our results demonstrate that moving beyond diagonal assumptions is essential for effective and scalable data-free continual learning. Our code is available at https://github. com/pkrukowski1/REMIX-Model-Inversion-via-Laplace-Kernel.
LGOct 6, 2022
Hypernetwork approach to Bayesian MAMLPiotr Borycki, Piotr Kubacki, Marcin Przewięźlikowski et al.
The main goal of Few-Shot learning algorithms is to enable learning from small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the shared universal weights of a meta-model, which are then adapted for specific tasks. However, the method suffers from over-fitting and poorly quantifies uncertainty due to limited data size. Bayesian approaches could, in principle, alleviate these shortcomings by learning weight distributions in place of point-wise weights. Unfortunately, previous modifications of MAML are limited due to the simplicity of Gaussian posteriors, MAML-like gradient-based weight updates, or by the same structure enforced for universal and adapted weights. In this paper, we propose a novel framework for Bayesian MAML called BayesianHMAML, which employs Hypernetworks for weight updates. It learns the universal weights point-wise, but a probabilistic structure is added when adapted for specific tasks. In such a framework, we can use simple Gaussian distributions or more complicated posteriors induced by Continuous Normalizing Flows.
LGAug 21, 2022
ProPaLL: Probabilistic Partial Label LearningŁukasz Struski, Jacek Tabor, Bartosz Zieliński
Partial label learning is a type of weakly supervised learning, where each training instance corresponds to a set of candidate labels, among which only one is true. In this paper, we introduce ProPaLL, a novel probabilistic approach to this problem, which has at least three advantages compared to the existing approaches: it simplifies the training process, improves performance, and can be applied to any deep architecture. Experiments conducted on artificial and real-world datasets indicate that ProPaLL outperforms the existing approaches.
CVApr 3, 2023
Gaussian model for closed curvesKrzysztof Byrski, Przemysław Spurek, Jacek Tabor
Gaussian Mixture Models (GMM) do not adapt well to curved and strongly nonlinear data. However, we can use Gaussians in the curvilinear coordinate systems to solve this problem. Moreover, such a solution allows for the adaptation of clusters to the complicated shapes defined by the family of functions. But still, it is challenging to model clusters as closed curves (e.g., circles, ellipses, etc.). In this work, we propose a density representation of the closed curve, which can be used to detect the complicated templates in the data. For this purpose, we define a new probability distribution to model closed curves. Then we construct a mixture of such distributions and show that it can be effectively trained in the case of the one-dimensional closed curves.
45.1CVMay 9Code
ProDG: Prototypes for Data-Free Generative Post-Hoc ExplainabilityPiotr Borycki, Magdalena Trędowicz, Jacek Tabor et al.
Ante-hoc interpretability methods based on prototypes provide highly accurate explanations by utilizing the intuitive "this looks like that" reasoning paradigm. On the other hand, post-hoc models can explain predictions for a single image without relying on an underlying dataset or requiring costly neural network retraining. Recent approaches successfully solve the retraining problem for prototype-based networks. However, they still face a fundamental limitation: they require access to a subset of data (e.g., a test or validation set) to search for and extract the visual prototypes. In this paper, we address this issue and introduce ProDG: Generative Prototypes for Data-Free Post-Hoc Explainability, a novel framework that leverages generative models to synthesize pure, high-fidelity prototypes directly from the frozen model's weights, completely eliminating the dependency on any external data. By establishing this new frontier in Data-Free XAI, ProDG unlocks robust visual interpretability for privacy-sensitive domains, where original data is strictly restricted or fundamentally inaccessible. Project page: https://github.com/piotr310100/ProDG
CVJul 17, 2024
GeoGuide: Geometric guidance of diffusion modelsMateusz Poleski, Jacek Tabor, Przemysław Spurek
Diffusion models are among the most effective methods for image generation. This is in particular because, unlike GANs, they can be easily conditioned during training to produce elements with desired class or properties. However, guiding a pre-trained diffusion model to generate elements from previously unlabeled data is significantly more challenging. One of the possible solutions was given by the ADM-G guiding approach. Although ADM-G successfully generates elements from the given class, there is a significant quality gap compared to a model originally conditioned on this class. In particular, the FID score obtained by the ADM-G-guided diffusion model is nearly three times lower than the class-conditioned guidance. We demonstrate that this issue is partly due to ADM-G providing minimal guidance during the final stage of the denoising process. To address this problem, we propose GeoGuide, a guidance model based on tracing the distance of the diffusion model's trajectory from the data manifold. The main idea of GeoGuide is to produce normalized adjustments during the backward denoising process. As shown in the experiments, GeoGuide surpasses the probabilistic approach ADM-G with respect to both the FID scores and the quality of the generated images.
43.3LGMay 23
LAPLEX: The FFT of Learnable Laplace KernelsŁukasz Struski, Hanna Blazhko, Piotr Kubaty et al.
Fast linear algebra in deep learning usually comes with a choice: fixed geometry and exact computation, as in the Fourier transform, or adaptive geometry paid for by dense parameters, random features, or low-rank surrogates. To move beyond this trade-off, we introduce LAPLEX, a class of exact, trainable (phased) Laplace-kernel operators. A LAPLEX layer is a typically full-rank dense matrix, implicitly defined by learnable coordinate anchors, with FFT-like scaling. Consequently, it supports trainable matrix--vector operations at vector dimensions up to $10^9$ on modern GPUs. As a neural layer, it yields compact projections and classification heads interpretable as soft, trainable routing models. The same primitive also serves as an efficient Gram operator, enabling high-dimensional covariance models on flattened images of dimension $3 \cdot 10^6$ that preserve visible spatial structure without imposing convolutional bias. These applications reflect a single principle: dense geometry can be learned without storing a dense matrix, which enables data-adaptive global interactions in regimes where ordinary dense layers are out of reach. In this sense, LAPLEX separates expressivity from storage cost: it behaves like a dense trainable matrix, but is represented and applied through a small structured set of parameters.
81.0LGMay 7Code
SoftSAE: Dynamic Top-K Selection for Adaptive Sparse AutoencodersJakub Stępień, Marcin Mazur, Jacek Tabor et al.
Sparse Autoencoders (SAEs) have become an important tool in mechanistic interpretability, helping to analyze internal representations in both Large Language Models (LLMs) and Vision Transformers (ViTs). By decomposing polysemantic activations into sparse sets of monosemantic features, SAEs aim to translate neural network computations into human-understandable concepts. However, common architectures such as TopK SAEs rely on a fixed sparsity level. They enforce the same number of active features (K) across all inputs, ignoring the varying complexity of real-world data. Natural data often lies on manifolds with varying local intrinsic dimensionality, meaning the number of relevant factors can change significantly across samples. This suggests that a fixed sparsity level is not optimal. Simple inputs may require only a few features, while more complex ones need more expressive representations. Using a constant K can therefore introduce noise in simple cases or miss important structure in more complex ones. To address this issue, we propose SoftSAE, a sparse autoencoder with a Dynamic Top-K selection mechanism. Our method uses a differentiable Soft Top-K operator to learn an input-dependent sparsity level k. This allows the model to adjust the number of active features based on the complexity of each input. As a result, the representation better matches the structure of the data, and the explanation length reflects the amount of information in the input. Experimental results confirm that SoftSAE not only finds meaningful features, but also selects the right number of features for each concept. The source code is available at: https://anonymous.4open.science/r/SoftSAE-8F71/.
NAOct 30, 2012
Strict localization of eigenvectors and eigenvaluesŁukasz Struski, Jacek Tabor
In this article we show and implement a simple and effcient method to strictly locate eigenvectors and eigenvalues of a given matrix, based on the modified cone condition. As a consequence we can also effectively localize zeros of complex polynomials.
CVSep 16, 2024
InfoDisent: Explainability of Image Classification Models by Information DisentanglementŁukasz Struski, Dawid Rymarczyk, Jacek Tabor
In this work, we introduce InfoDisent, a hybrid approach to explainability based on the information bottleneck principle. InfoDisent enables the disentanglement of information in the final layer of any pretrained model into atomic concepts, which can be interpreted as prototypical parts. This approach merges the flexibility of post-hoc methods with the concept-level modeling capabilities of self-explainable neural networks, such as ProtoPNets. We demonstrate the effectiveness of InfoDisent through computational experiments and user studies across various datasets using modern backbones such as ViTs and convolutional networks. Notably, InfoDisent generalizes the prototypical parts approach to novel domains (ImageNet).
IVNov 7, 2023
MeVGAN: GAN-based Plugin Model for Video Generation with Applications in ColonoscopyŁukasz Struski, Tomasz Urbańczyk, Krzysztof Bucki et al.
Video generation is important, especially in medicine, as much data is given in this form. However, video generation of high-resolution data is a very demanding task for generative models, due to the large need for memory. In this paper, we propose Memory Efficient Video GAN (MeVGAN) - a Generative Adversarial Network (GAN) which uses plugin-type architecture. We use a pre-trained 2D-image GAN and only add a simple neural network to construct respective trajectories in the noise space, so that the trajectory forwarded through the GAN model constructs a real-life video. We apply MeVGAN in the task of generating colonoscopy videos. Colonoscopy is an important medical procedure, especially beneficial in screening and managing colorectal cancer. However, because colonoscopy is difficult and time-consuming to learn, colonoscopy simulators are widely used in educating young colonoscopists. We show that MeVGAN can produce good quality synthetic colonoscopy videos, which can be potentially used in virtual simulators.
41.7CVMay 21
Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language ModelsPiotr Kubaty, Patryk Marszałek, Łukasz Struski et al.
Vision-language models learn powerful multimodal embeddings, yet their internal semantics remain opaque. While sparse autoencoders (SAEs) can extract interpretable features, they rely on expanding the representation dimension, which compromises the original geometry and introduces redundancy. We introduce CEDAR (Conceptual Embedding Disentanglement via Adaptive Rotation), a post-hoc method that reveals the compositional structure of pretrained embeddings without increasing dimensionality. By learning an invertible transformation with a top-$k$ sparsity bottleneck, CEDAR concentrates semantic information into axis-aligned disentangled coordinates. In CLIP-like architecture, individual coordinates can be interpreted with textual concepts, while for generative models such as BLIP, they can be decoded into natural language descriptions. Experiments demonstrate that CEDAR achieves a competitive reconstruction-sparsity trade-off while producing explanations that are more interpretable and better aligned with human perception. Our results suggest that the apparent entanglement in vision-language representations can be resolved through a suitable change of basis, eliminating the need for overcomplete expansions.
LGDec 28, 2024Code
VisTabNet: Adapting Vision Transformers for Tabular DataWitold Wydmański, Ulvi Movsum-zada, Jacek Tabor et al.
Although deep learning models have had great success in natural language processing and computer vision, we do not observe comparable improvements in the case of tabular data, which is still the most common data type used in biological, industrial and financial applications. In particular, it is challenging to transfer large-scale pre-trained models to downstream tasks defined on small tabular datasets. To address this, we propose VisTabNet -- a cross-modal transfer learning method, which allows for adapting Vision Transformer (ViT) with pre-trained weights to process tabular data. By projecting tabular inputs to patch embeddings acceptable by ViT, we can directly apply a pre-trained Transformer Encoder to tabular inputs. This approach eliminates the conceptual cost of designing a suitable architecture for processing tabular data, while reducing the computational cost of training the model from scratch. Experimental results on multiple small tabular datasets (less than 1k samples) demonstrate VisTabNet's superiority, outperforming both traditional ensemble methods and recent deep learning models. The proposed method goes beyond conventional transfer learning practice and shows that pre-trained image models can be transferred to solve tabular problems, extending the boundaries of transfer learning. We share our example implementation as a GitHub repository available at https://github.com/wwydmanski/VisTabNet.
LGJun 3, 2024Code
Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal InitializationAleksandra Irena Nowak, Łukasz Gniecki, Filip Szatkowski et al.
Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years. A key design choice is given by the sparse initialization, which determines the trainable sub-network through a binary mask. Existing methods mainly select such mask based on a predefined dense initialization. Such an approach may not efficiently leverage the mask's potential impact on the optimization. An alternative direction, inspired by research into dynamical isometry, is to introduce orthogonality in the sparse subnetwork, which helps in stabilizing the gradient signal. In this work, we propose Exact Orthogonal Initialization (EOI), a novel sparse orthogonal initialization scheme based on composing random Givens rotations. Contrary to other existing approaches, our method provides exact (not approximated) orthogonality and enables the creation of layers with arbitrary densities. We demonstrate the superior effectiveness and efficiency of EOI through experiments, consistently outperforming common sparse initialization techniques. Our method enables training highly sparse 1000-layer MLP and CNN networks without residual connections or normalization techniques, emphasizing the crucial role of weight initialization in static sparse training alongside sparse mask selection. The code is available at https://github.com/woocash2/sparser-better-deeper-stronger
LGJan 13, 2020Code
WICA: nonlinear weighted ICAAndrzej Bedychaj, Przemysław Spurek, Aleksandra Nowak et al.
Independent Component Analysis (ICA) aims to find a coordinate system in which the components of the data are independent. In this paper we construct a new nonlinear ICA model, called WICA, which obtains better and more stable results than other algorithms. A crucial tool is given by a new efficient method of verifying nonlinear dependence with the use of computation of correlation coefficients for normally weighted data. In addition, authors propose a new baseline nonlinear mixing to perform comparable experiments, and a~reliable measure which allows fair comparison of nonlinear models. Our code for WICA is available on Github https://github.com/gmum/wica.
CVFeb 2, 2024
GaMeS: Mesh-Based Adapting and Modification of Gaussian SplattingJoanna Waczyńska, Piotr Borycki, Sławomir Tadeja et al.
Gaussian Splatting (GS) is a novel, state-of-the-art technique for rendering points in a 3D scene by approximating their contribution to image pixels through Gaussian distributions, warranting fast training and real-time rendering. The main drawback of GS is the absence of a well-defined approach for its conditioning due to the necessity of conditioning several hundred thousand Gaussian components. To solve this, we introduce the Gaussian Mesh Splatting (GaMeS) model, which allows modification of Gaussian components in a similar way as meshes. We parameterize each Gaussian component by the vertices of the mesh face. Furthermore, our model needs mesh initialization on input or estimated mesh during training. We also define Gaussian splats solely based on their location on the mesh, allowing for automatic adjustments in position, scale, and rotation during animation. As a result, we obtain a real-time rendering of editable GS.
41.3LGMay 8
Bayesian Fine-tuning in Projected SubspacesViktar Dubovik, Patryk Marszałek, Jacek Tabor et al.
Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of large models by decomposing weight updates into low-rank matrices, significantly reducing storage and computational overhead. While effective, standard LoRA lacks mechanisms for uncertainty quantification, leading to overconfident and poorly calibrated models. Bayesian variants of LoRA address this limitation, but at the cost of a significantly increased number of trainable parameters, partially offsetting the original efficiency gains. Additionally, these models are harder to train and may suffer from unstable convergence. In this work, we propose a novel framework for parameter-efficient Bayesian fine-tuning, demonstrating that effective uncertainty quantification can be achieved in very low-dimensional parameter spaces. The proposed method achieves strong performance with improved calibration and generalization while maintaining computational efficiency. Our empirical findings show that, with the appropriate projection of the weight space uncertainty can be effectively modeled in a low-dimensional space, and weight covariances exhibit low ranks.
CVDec 21, 2023
Gaussian Splatting with NeRF-based Color and OpacityDawid Malarz, Weronika Smolak, Jacek Tabor et al.
Neural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. By encoding the shape and color information within neural network weights, NeRFs excel at producing strikingly sharp novel views of 3D objects. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding its versatility. In contrast, Gaussian Splatting (GS) offers a similar render quality with faster training and inference as it does not need neural networks to work. It encodes information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS are difficult to condition since they usually require circa hundred thousand Gaussian components. To mitigate the caveats of both models, we propose a hybrid model Viewing Direction Gaussian Splatting (VDGS) that uses GS representation of the 3D object's shape and NeRF-based encoding of color and opacity. Our model uses Gaussian distributions with trainable positions (i.e. means of Gaussian), shape (i.e. covariance of Gaussian), color and opacity, and a neural network that takes Gaussian parameters and viewing direction to produce changes in the said color and opacity. As a result, our model better describes shadows, light reflections, and the transparency of 3D objects without adding additional texture and light components.
CVMay 23, 2024
LucidPPN: Unambiguous Prototypical Parts Network for User-centric Interpretable Computer VisionMateusz Pach, Dawid Rymarczyk, Koryna Lewandowska et al.
Prototypical parts networks combine the power of deep learning with the explainability of case-based reasoning to make accurate, interpretable decisions. They follow the this looks like that reasoning, representing each prototypical part with patches from training images. However, a single image patch comprises multiple visual features, such as color, shape, and texture, making it difficult for users to identify which feature is important to the model. To reduce this ambiguity, we introduce the Lucid Prototypical Parts Network (LucidPPN), a novel prototypical parts network that separates color prototypes from other visual features. Our method employs two reasoning branches: one for non-color visual features, processing grayscale images, and another focusing solely on color information. This separation allows us to clarify whether the model's decisions are based on color, shape, or texture. Additionally, LucidPPN identifies prototypical parts corresponding to semantic parts of classified objects, making comparisons between data classes more intuitive, e.g., when two bird species might differ primarily in belly color. Our experiments demonstrate that the two branches are complementary and together achieve results comparable to baseline methods. More importantly, LucidPPN generates less ambiguous prototypical parts, enhancing user understanding.
CVFeb 14, 2025
Classifier-free Guidance with Adaptive ScalingDawid Malarz, Artur Kasymov, Maciej Zięba et al.
Classifier-free guidance (CFG) is an essential mechanism in contemporary text-driven diffusion models. In practice, in controlling the impact of guidance we can see the trade-off between the quality of the generated images and correspondence to the prompt. When we use strong guidance, generated images fit the conditioned text perfectly but at the cost of their quality. Dually, we can use small guidance to generate high-quality results, but the generated images do not suit our prompt. In this paper, we present $β$-CFG ($β$-adaptive scaling in Classifier-Free Guidance), which controls the impact of guidance during generation to solve the above trade-off. First, $β$-CFG stabilizes the effects of guiding by gradient-based adaptive normalization. Second, $β$-CFG uses the family of single-modal ($β$-distribution), time-dependent curves to dynamically adapt the trade-off between prompt matching and the quality of samples during the diffusion denoising process. Our model obtained better FID scores, maintaining the text-to-image CLIP similarity scores at a level similar to that of the reference CFG.
CVJan 31, 2025
RaySplats: Ray Tracing based Gaussian SplattingKrzysztof Byrski, Marcin Mazur, Jacek Tabor et al.
3D Gaussian Splatting (3DGS) is a process that enables the direct creation of 3D objects from 2D images. This representation offers numerous advantages, including rapid training and rendering. However, a significant limitation of 3DGS is the challenge of incorporating light and shadow reflections, primarily due to the utilization of rasterization rather than ray tracing for rendering. This paper introduces RaySplats, a model that employs ray-tracing based Gaussian Splatting. Rather than utilizing the projection of Gaussians, our method employs a ray-tracing mechanism, operating directly on Gaussian primitives represented by confidence ellipses with RGB colors. In practice, we compute the intersection between ellipses and rays to construct ray-tracing algorithms, facilitating the incorporation of meshes with Gaussian Splatting models and the addition of lights, shadows, and other related effects.
LGFeb 11, 2025
SEMU: Singular Value Decomposition for Efficient Machine UnlearningMarcin Sendera, Łukasz Struski, Kamil Książek et al.
While the capabilities of generative foundational models have advanced rapidly in recent years, methods to prevent harmful and unsafe behaviors remain underdeveloped. Among the pressing challenges in AI safety, machine unlearning (MU) has become increasingly critical to meet upcoming safety regulations. Most existing MU approaches focus on altering the most significant parameters of the model. However, these methods often require fine-tuning substantial portions of the model, resulting in high computational costs and training instabilities, which are typically mitigated by access to the original training dataset. In this work, we address these limitations by leveraging Singular Value Decomposition (SVD) to create a compact, low-dimensional projection that enables the selective forgetting of specific data points. We propose Singular Value Decomposition for Efficient Machine Unlearning (SEMU), a novel approach designed to optimize MU in two key aspects. First, SEMU minimizes the number of model parameters that need to be modified, effectively removing unwanted knowledge while making only minimal changes to the model's weights. Second, SEMU eliminates the dependency on the original training dataset, preserving the model's previously acquired knowledge without additional data requirements. Extensive experiments demonstrate that SEMU achieves competitive performance while significantly improving efficiency in terms of both data usage and the number of modified parameters.
CVAug 7, 2025
UnGuide: Learning to Forget with LoRA-Guided Diffusion ModelsAgnieszka Polowczyk, Alicja Polowczyk, Dawid Malarz et al.
Recent advances in large-scale text-to-image diffusion models have heightened concerns about their potential misuse, especially in generating harmful or misleading content. This underscores the urgent need for effective machine unlearning, i.e., removing specific knowledge or concepts from pretrained models without compromising overall performance. One possible approach is Low-Rank Adaptation (LoRA), which offers an efficient means to fine-tune models for targeted unlearning. However, LoRA often inadvertently alters unrelated content, leading to diminished image fidelity and realism. To address this limitation, we introduce UnGuide -- a novel approach which incorporates UnGuidance, a dynamic inference mechanism that leverages Classifier-Free Guidance (CFG) to exert precise control over the unlearning process. UnGuide modulates the guidance scale based on the stability of a few first steps of denoising processes, enabling selective unlearning by LoRA adapter. For prompts containing the erased concept, the LoRA module predominates and is counterbalanced by the base model; for unrelated prompts, the base model governs generation, preserving content fidelity. Empirical results demonstrate that UnGuide achieves controlled concept removal and retains the expressive power of diffusion models, outperforming existing LoRA-based methods in both object erasure and explicit content removal tasks.
CVMay 19, 2025
EPIC: Explanation of Pretrained Image Classification Networks via PrototypePiotr Borycki, Magdalena Trędowicz, Szymon Janusz et al.
Explainable AI (XAI) methods generally fall into two categories. Post-hoc approaches generate explanations for pre-trained models and are compatible with various neural network architectures. These methods often use feature importance visualizations, such as saliency maps, to indicate which input regions influenced the model's prediction. Unfortunately, they typically offer a coarse understanding of the model's decision-making process. In contrast, ante-hoc (inherently explainable) methods rely on specially designed model architectures trained from scratch. A notable subclass of these methods provides explanations through prototypes, representative patches extracted from the training data. However, prototype-based approaches have limitations: they require dedicated architectures, involve specialized training procedures, and perform well only on specific datasets. In this work, we propose EPIC (Explanation of Pretrained Image Classification), a novel approach that bridges the gap between these two paradigms. Like post-hoc methods, EPIC operates on pre-trained models without architectural modifications. Simultaneously, it delivers intuitive, prototype-based explanations inspired by ante-hoc techniques. To the best of our knowledge, EPIC is the first post-hoc method capable of fully replicating the core explanatory power of inherently interpretable models. We evaluate EPIC on benchmark datasets commonly used in prototype-based explanations, such as CUB-200-2011 and Stanford Cars, alongside large-scale datasets like ImageNet, typically employed by post-hoc methods. EPIC uses prototypes to explain model decisions, providing a flexible and easy-to-understand tool for creating clear, high-quality explanations.
AIMar 8, 2025
LapSum -- One Method to Differentiate Them All: Ranking, Sorting and Top-k SelectionŁukasz Struski, Michał B. Bednarczyk, Igor T. Podolak et al.
We present a novel technique for constructing differentiable order-type operations, including soft ranking, soft top-k selection, and soft permutations. Our approach leverages an efficient closed-form formula for the inverse of the function LapSum, defined as the sum of Laplace distributions. This formulation ensures low computational and memory complexity in selecting the highest activations, enabling losses and gradients to be computed in $O(n\log{}n)$ time. Through extensive experiments, we demonstrate that our method outperforms state-of-the-art techniques for high-dimensional vectors and large $k$ values. Furthermore, we provide efficient implementations for both CPU and CUDA environments, underscoring the practicality and scalability of our method for large-scale ranking and differentiable ordering problems.
LGFeb 17, 2025
Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRAPatryk Marszałek, Klaudia Bałazy, Jacek Tabor et al.
Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of large language models by decomposing weight updates into low-rank matrices, significantly reducing storage and computational overhead. While effective, standard LoRA lacks mechanisms for uncertainty quantification, leading to overconfident and poorly calibrated models. Bayesian variants of LoRA address this limitation, but at the cost of a significantly increased number of trainable parameters, partially offsetting the original efficiency gains. Additionally, these models are harder to train and may suffer from unstable convergence. In this work, we propose a novel parameter-efficient Bayesian LoRA via subspace inference, demonstrating that effective uncertainty quantification can be achieved in very low-dimensional parameter spaces. The proposed method achieves strong performance with improved calibration and generalization while maintaining computational efficiency. Our empirical findings show that, with the appropriate projection of the weight space: (1) uncertainty can be effectively modeled in a low-dimensional space, and (2) weight covariances exhibit low ranks.
LGMar 12, 2024
ProPML: Probability Partial Multi-label LearningŁukasz Struski, Adam Pardyl, Jacek Tabor et al.
Partial Multi-label Learning (PML) is a type of weakly supervised learning where each training instance corresponds to a set of candidate labels, among which only some are true. In this paper, we introduce \our{}, a novel probabilistic approach to this problem that extends the binary cross entropy to the PML setup. In contrast to existing methods, it does not require suboptimal disambiguation and, as such, can be applied to any deep architecture. Furthermore, experiments conducted on artificial and real-world datasets indicate that \our{} outperforms existing approaches, especially for high noise in a candidate set.
LGNov 21, 2025
InTAct: Interval-based Task Activation Consolidation for Continual LearningPatryk Krukowski, Jan Miksa, Piotr Helm et al.
Continual learning is a fundamental challenge in artificial intelligence that requires networks to acquire new knowledge while preserving previously learned representations. Despite the success of various approaches, most existing paradigms do not provide rigorous mathematical guarantees against catastrophic forgetting. Current methods that offer such guarantees primarily focus on analyzing the parameter space using \textit{interval arithmetic (IA)}, as seen in frameworks such as InterContiNet. However, restricting high-dimensional weight updates can be computationally expensive. In this work, we propose InTAct (Interval-based Task Activation Consolidation), a method that mitigates catastrophic forgetting by enforcing functional invariance at the neuron level. We identify specific activation intervals where previous tasks reside and constrain updates within these regions while allowing for flexible adaptation elsewhere. By ensuring that predictions remain stable within these nested activation intervals, we provide a tractable mathematical guarantee of functional invariance. We emphasize that regulating the activation space is significantly more efficient than parameter-based constraints, because the dimensionality of internal signals is much lower than that of the vast space of model weights. While our approach is architecture-agnostic and applicable to various continual learning settings, its integration with prompt-based methods enables it to achieve state-of-the-art performance on challenging benchmarks.
CVJul 25, 2025
SIDE: Sparse Information Disentanglement for Explainable Artificial IntelligenceViktar Dubovik, Łukasz Struski, Jacek Tabor et al.
Understanding the decisions made by deep neural networks is essential in high-stakes domains such as medical imaging and autonomous driving. Yet, these models often lack transparency, particularly in computer vision. Prototypical-parts-based neural networks have emerged as a promising solution by offering concept-level explanations. However, most are limited to fine-grained classification tasks, with few exceptions such as InfoDisent. InfoDisent extends prototypical models to large-scale datasets like ImageNet, but produces complex explanations. We introduce Sparse Information Disentanglement for Explainability (SIDE), a novel method that improves the interpretability of prototypical parts through a dedicated training and pruning scheme that enforces sparsity. Combined with sigmoid activations in place of softmax, this approach allows SIDE to associate each class with only a small set of relevant prototypes. Extensive experiments show that SIDE matches the accuracy of existing methods while reducing explanation size by over $90\%$, substantially enhancing the understandability of prototype-based explanations.
LGMay 15, 2025
ZEUS: Zero-shot Embeddings for Unsupervised Separation of Tabular DataPatryk Marszałek, Tomasz Kuśmierczyk, Witold Wydmański et al.
Clustering tabular data remains a significant open challenge in data analysis and machine learning. Unlike for image data, similarity between tabular records often varies across datasets, making the definition of clusters highly dataset-dependent. Furthermore, the absence of supervised signals complicates hyperparameter tuning in deep learning clustering methods, frequently resulting in unstable performance. To address these issues and reduce the need for per-dataset tuning, we adopt an emerging approach in deep learning: zero-shot learning. We propose ZEUS, a self-contained model capable of clustering new datasets without any additional training or fine-tuning. It operates by decomposing complex datasets into meaningful components that can then be clustered effectively. Thanks to pre-training on synthetic datasets generated from a latent-variable prior, it generalizes across various datasets without requiring user intervention. To the best of our knowledge, ZEUS is the first zero-shot method capable of generating embeddings for tabular data in a fully unsupervised manner. Experimental results demonstrate that it performs on par with or better than traditional clustering algorithms and recent deep learning-based methods, while being significantly faster and more user-friendly.
CVApr 9, 2025
CEC-MMR: Cross-Entropy Clustering Approach to Multi-Modal RegressionKrzysztof Byrski, Jacek Tabor, Przemysław Spurek et al.
In practical applications of regression analysis, it is not uncommon to encounter a multitude of values for each attribute. In such a situation, the univariate distribution, which is typically Gaussian, is suboptimal because the mean may be situated between modes, resulting in a predicted value that differs significantly from the actual data. Consequently, to address this issue, a mixture distribution with parameters learned by a neural network, known as a Mixture Density Network (MDN), is typically employed. However, this approach has an important inherent limitation, in that it is not feasible to ascertain the precise number of components with a reasonable degree of accuracy. In this paper, we introduce CEC-MMR, a novel approach based on Cross-Entropy Clustering (CEC), which allows for the automatic detection of the number of components in a regression problem. Furthermore, given an attribute and its value, our method is capable of uniquely identifying it with the underlying component. The experimental results demonstrate that CEC-MMR yields superior outcomes compared to classical MDNs.
LGMar 18, 2025
FeNeC: Enhancing Continual Learning via Feature Clustering with Neighbor- or Logit-Based ClassificationKamil Książek, Hubert Jastrzębski, Bartosz Trojan et al.
The ability of deep learning models to learn continuously is essential for adapting to new data categories and evolving data distributions. In recent years, approaches leveraging frozen feature extractors after an initial learning phase have been extensively studied. Many of these methods estimate per-class covariance matrices and prototypes based on backbone-derived feature representations. Within this paradigm, we introduce FeNeC (Feature Neighborhood Classifier) and FeNeC-Log, its variant based on the log-likelihood function. Our approach generalizes the existing concept by incorporating data clustering to capture greater intra-class variability. Utilizing the Mahalanobis distance, our models classify samples either through a nearest neighbor approach or trainable logit values assigned to consecutive classes. Our proposition may be reduced to the existing approaches in a special case while extending them with the ability of more flexible adaptation to data. We demonstrate that two FeNeC variants achieve competitive performance in scenarios where task identities are unknown and establish state-of-the-art results on several benchmarks.