Vladimir Aletov

LG
h-index111
4papers
7citations
Novelty53%
AI Score48

4 Papers

LGJun 2Code
Rethinking the Role of Tensor Decompositions in Post-Training LLM Compression

Artur Zagitov, Alexander Miasnikov, Maxim Krutikov et al.

Post-training compression is essential for deploying large language models (LLMs) under tight resource constraints. Tensor decompositions have emerged as a promising direction, offering compact parameterizations well suited to Transformer weight structures. However, existing studies evaluate these methods in narrow settings, leaving unclear whether tensorization is effective at large-scale deployment. We systematically evaluate tensor compression across dense and MoE architectures, establishing performance trade-offs grounded in both empirical analysis and theoretical analysis. We identify a fundamental mismatch between the shared subspaces assumed by tensor decompositions and the heterogeneous representations learned by modern LLMs, thereby delineating their practical limits and clarifying their viable role in large-scale deployment. The code is available at https://github.com/brain-lab-research/TT-LLM.

LGMay 15
Scalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured Updates

Roman Maksimov, Vladimir Aletov, Dmitry Bylinkin et al.

Knowledge editing (KE) provides a lightweight alternative to repeated fine-tuning of LLMs. However, most existing KE methods target dense feed-forward layers, while modern LLMs increasingly adopt Mixture-of-Experts (MoE) architectures for their superior memory footprint and inference efficiency. This mismatch leaves a growing class of production models without principled editing tools. We propose a MEMIT-like framework for knowledge editing in MoE-based LLMs. Our method exploits the tensor structure of MoE layers to formulate the editing objective faithfully at the per expert level, and applies the Woodbury matrix identity to avoid materializing or inverting the full stacked matrix of expert weights. The resulting update reduces to inversions of fixed low-rank matrices and requires no additional backward passes. Empirically, our approach matches the editing quality of strong baselines on the main KE metrics while accelerating the editing procedure by up to 6x, owing to the batched MEMIT-style formulation and the low-dimensional inversions enabled by the Woodbury identity. These results show that closed-form, parameter-modifying KE can be extended efficiently beyond dense layers, opening a path toward scalable knowledge editing in modern sparse LLM architectures.

LGJul 16, 2025
LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters

Vladimir Bogachev, Vladimir Aletov, Alexander Molozhavenko et al.

This work presents a novel, fully Riemannian framework for Low-Rank Adaptation (LoRA) that geometrically treats low-rank adapters by optimizing them directly on the fixed-rank manifold. This formulation eliminates the parametrization ambiguity present in standard Euclidean optimizers. Our framework integrates three key components to achieve this: (1) we derive Riemannion, a new Riemannian optimizer on the fixed-rank matrix manifold that generalizes the recently proposed Muon optimizer; (2) we develop a Riemannian gradient-informed LoRA initialization, and (3) we provide an efficient implementation without prominent overhead that uses automatic differentiation to compute arising geometric operations while adhering to best practices in numerical linear algebra. Comprehensive experimental results on both LLM and diffusion model architectures demonstrate that our approach yields consistent and noticeable improvements in convergence speed and final task performance over both standard LoRA and its state-of-the-art modifications.

LGMay 12, 2025
Trial and Trust: Addressing Byzantine Attacks with Comprehensive Defense Strategy

Gleb Molodtsov, Daniil Medyakov, Sergey Skorik et al.

Recent advancements in machine learning have improved performance while also increasing computational demands. While federated and distributed setups address these issues, their structure is vulnerable to malicious influences. In this paper, we address a specific threat, Byzantine attacks, where compromised clients inject adversarial updates to derail global convergence. We combine the trust scores concept with trial function methodology to dynamically filter outliers. Our methods address the critical limitations of previous approaches, allowing functionality even when Byzantine nodes are in the majority. Moreover, our algorithms adapt to widely used scaled methods like Adam and RMSProp, as well as practical scenarios, including local training and partial participation. We validate the robustness of our methods by conducting extensive experiments on both synthetic and real ECG data collected from medical institutions. Furthermore, we provide a broad theoretical analysis of our algorithms and their extensions to aforementioned practical setups. The convergence guarantees of our methods are comparable to those of classical algorithms developed without Byzantine interference.