CVJul 27, 2023Code
vox2vec: A Framework for Self-supervised Contrastive Learning of Voxel-level Representations in Medical ImagesMikhail Goncharov, Vera Soboleva, Anvar Kurmukov et al.
This paper introduces vox2vec - a contrastive method for self-supervised learning (SSL) of voxel-level representations. vox2vec representations are modeled by a Feature Pyramid Network (FPN): a voxel representation is a concatenation of the corresponding feature vectors from different pyramid levels. The FPN is pre-trained to produce similar representations for the same voxel in different augmented contexts and distinctive representations for different voxels. This results in unified multi-scale representations that capture both global semantics (e.g., body part) and local semantics (e.g., different small organs or healthy versus tumor tissue). We use vox2vec to pre-train a FPN on more than 6500 publicly available computed tomography images. We evaluate the pre-trained representations by attaching simple heads on top of them and training the resulting models for 22 segmentation tasks. We show that vox2vec outperforms existing medical imaging SSL techniques in three evaluation setups: linear and non-linear probing and end-to-end fine-tuning. Moreover, a non-linear head trained on top of the frozen vox2vec representations achieves competitive performance with the FPN trained from scratch while having 50 times fewer trainable parameters. The code is available at https://github.com/mishgon/vox2vec .
CVApr 6Code
OrthoFuse: Training-free Riemannian Fusion of Orthogonal Style-Concept Adapters for Diffusion ModelsAli Aliev, Kamil Garifullin, Nikolay Yudin et al.
In a rapidly growing field of model training there is a constant practical interest in parameter-efficient fine-tuning and various techniques that use a small amount of training data to adapt the model to a narrow task. However, there is an open question: how to combine several adapters tuned for different tasks into one which is able to yield adequate results on both tasks? Specifically, merging subject and style adapters for generative models remains unresolved. In this paper we seek to show that in the case of orthogonal fine-tuning (OFT), we can use structured orthogonal parametrization and its geometric properties to get the formulas for training-free adapter merging. In particular, we derive the structure of the manifold formed by the recently proposed Group-and-Shuffle ($\mathcal{GS}$) orthogonal matrices, and obtain efficient formulas for the geodesics approximation between two points. Additionally, we propose a $\text{spectra restoration}$ transform that restores spectral properties of the merged adapter for higher-quality fusion. We conduct experiments in subject-driven generation tasks showing that our technique to merge two $\mathcal{GS}$ orthogonal matrices is capable of uniting concept and style features of different adapters. To the best of our knowledge, this is the first training-free method for merging multiplicative orthogonal adapters. Code is available via the $\href{https://github.com/ControlGenAI/OrthoFuse}{link}$.
CVFeb 9, 2025Code
Beyond Fine-Tuning: A Systematic Study of Sampling Techniques in Personalized Image GenerationVera Soboleva, Maksim Nakhodnov, Aibek Alanov
Personalized text-to-image generation aims to create images tailored to user-defined concepts and textual descriptions. Balancing the fidelity of the learned concept with its ability for generation in various contexts presents a significant challenge. Existing methods often address this through diverse fine-tuning parameterizations and improved sampling strategies that integrate superclass trajectories during the diffusion process. While improved sampling offers a cost-effective, training-free solution for enhancing fine-tuned models, systematic analyses of these methods remain limited. Current approaches typically tie sampling strategies with fixed fine-tuning configurations, making it difficult to isolate their impact on generation outcomes. To address this issue, we systematically analyze sampling strategies beyond fine-tuning, exploring the impact of concept and superclass trajectories on the results. Building on this analysis, we propose a decision framework evaluating text alignment, computational constraints, and fidelity objectives to guide strategy selection. It integrates with diverse architectures and training approaches, systematically optimizing concept preservation, prompt adherence, and resource efficiency. The source code can be found at https://github.com/ControlGenAI/PersonGenSampler.
CVJul 8, 2025Code
T-LoRA: Single Image Diffusion Model Customization Without OverfittingVera Soboleva, Aibek Alanov, Andrey Kuznetsov et al.
While diffusion model fine-tuning offers a powerful approach for customizing pre-trained models to generate specific objects, it frequently suffers from overfitting when training samples are limited, compromising both generalization capability and output diversity. This paper tackles the challenging yet most impactful task of adapting a diffusion model using just a single concept image, as single-image customization holds the greatest practical potential. We introduce T-LoRA, a Timestep-Dependent Low-Rank Adaptation framework specifically designed for diffusion model personalization. In our work we show that higher diffusion timesteps are more prone to overfitting than lower ones, necessitating a timestep-sensitive fine-tuning strategy. T-LoRA incorporates two key innovations: (1) a dynamic fine-tuning strategy that adjusts rank-constrained updates based on diffusion timesteps, and (2) a weight parametrization technique that ensures independence between adapter components through orthogonal initialization. Extensive experiments show that T-LoRA and its individual components outperform standard LoRA and other diffusion model personalization techniques. They achieve a superior balance between concept fidelity and text alignment, highlighting the potential of T-LoRA in data-limited and resource-constrained scenarios. Code is available at https://github.com/ControlGenAI/T-LoRA.
CVApr 11, 2021Code
Raindrops on Windshield: Dataset and Lightweight Gradient-Based Detection AlgorithmVera Soboleva, Oleg Shipitko
Autonomous vehicles use cameras as one of the primary sources of information about the environment. Adverse weather conditions such as raindrops, snow, mud, and others, can lead to various image artifacts. Such artifacts significantly degrade the quality and reliability of the obtained visual data and can lead to accidents if they are not detected in time. This paper presents ongoing work on a new dataset for training and assessing vision algorithms' performance for different tasks of image artifacts detection on either camera lens or windshield. At the moment, we present a publicly available set of images containing $8190$ images, of which $3390$ contain raindrops. Images are annotated with the binary mask representing areas with raindrops. We demonstrate the applicability of the dataset in the problems of raindrops presence detection and raindrop region segmentation. To augment the data, we also propose an algorithm for data augmentation which allows the generation of synthetic raindrops on images. Apart from the dataset, we present a novel gradient-based algorithm for raindrop presence detection in a video sequence. The experimental evaluation proves that the algorithm reliably detects raindrops. Moreover, compared with the state-of-the-art cross-correlation-based algorithm \cite{Einecke2014}, the proposed algorithm showed a higher quality of raindrop presence detection and image processing speed, making it applicable for the self-check procedure of real autonomous systems. The dataset is available at \href{https://github.com/EvoCargo/RaindropsOnWindshield}{$github.com/EvoCargo/RaindropsOnWindshield$}.
LGJul 16, 2025
LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank AdaptersVladimir Bogachev, Vladimir Aletov, Alexander Molozhavenko et al.
This work presents a novel, fully Riemannian framework for Low-Rank Adaptation (LoRA) that geometrically treats low-rank adapters by optimizing them directly on the fixed-rank manifold. This formulation eliminates the parametrization ambiguity present in standard Euclidean optimizers. Our framework integrates three key components to achieve this: (1) we derive Riemannion, a new Riemannian optimizer on the fixed-rank matrix manifold that generalizes the recently proposed Muon optimizer; (2) we develop a Riemannian gradient-informed LoRA initialization, and (3) we provide an efficient implementation without prominent overhead that uses automatic differentiation to compute arising geometric operations while adhering to best practices in numerical linear algebra. Comprehensive experimental results on both LLM and diffusion model architectures demonstrate that our approach yields consistent and noticeable improvements in convergence speed and final task performance over both standard LoRA and its state-of-the-art modifications.
LGJun 14, 2024
Group and Shuffle: Efficient Structured Orthogonal ParametrizationMikhail Gorbunov, Nikolay Yudin, Vera Soboleva et al.
The increasing size of neural networks has led to a growing demand for methods of efficient fine-tuning. Recently, an orthogonal fine-tuning paradigm was introduced that uses orthogonal matrices for adapting the weights of a pretrained model. In this paper, we introduce a new class of structured matrices, which unifies and generalizes structured classes from previous works. We examine properties of this class and build a structured orthogonal parametrization upon it. We then use this parametrization to modify the orthogonal fine-tuning framework, improving parameter and computational efficiency. We empirically validate our method on different domains, including adapting of text-to-image diffusion models and downstream task fine-tuning in language modeling. Additionally, we adapt our construction for orthogonal convolutions and conduct experiments with 1-Lipschitz neural networks.