Ivan Oseledets

LG
h-index50
182papers
5,227citations
Novelty47%
AI Score59

182 Papers

CVMar 21, 2022Code
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning

Aleksandr Ermolov, Leyla Mirvakhabova, Valentin Khrulkov et al.

Metric learning aims to learn a highly discriminative model encouraging the embeddings of similar classes to be close in the chosen metrics and pushed apart for dissimilar ones. The common recipe is to use an encoder to extract embeddings and a distance-based loss function to match the representations -- usually, the Euclidean distance is utilized. An emerging interest in learning hyperbolic data embeddings suggests that hyperbolic geometry can be beneficial for natural data. Following this line of work, we propose a new hyperbolic-based model for metric learning. At the core of our method is a vision transformer with output embeddings mapped to hyperbolic space. These embeddings are directly optimized using modified pairwise cross-entropy loss. We evaluate the proposed model with six different formulations on four datasets achieving the new state-of-the-art performance. The source code is available at https://github.com/htdt/hyp_metric.

LGJul 31, 2022Code
Eco2AI: carbon emissions tracking of machine learning models as the first step towards sustainable AI

Semen Budennyy, Vladimir Lazarev, Nikita Zakharenko et al.

The size and complexity of deep neural networks continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an open-source package eco2AI to help data scientists and researchers to track energy consumption and equivalent CO2 emissions of their models in a straightforward way. In eco2AI we put emphasis on accuracy of energy consumption tracking and correct regional CO2 emissions accounting. We encourage research community to search for new optimal Artificial Intelligence (AI) architectures with a lower computational cost. The motivation also comes from the concept of AI-based green house gases sequestrating cycle with both Sustainable AI and Green AI pathways.

MSMar 2, 2020
Tensor Train decomposition on TensorFlow (T3F)

Alexander Novikov, Pavel Izmailov, Valentin Khrulkov et al. · openai

Tensor Train decomposition is used across many branches of machine learning. We present T3F -- a library for Tensor Train decomposition based on TensorFlow. T3F supports GPU execution, batch processing, automatic differentiation, and versatile functionality for the Riemannian optimization framework, which takes into account the underlying manifold structure to construct efficient optimization methods. The library makes it easier to implement machine learning papers that rely on the Tensor Train decomposition. T3F includes documentation, examples and 94% test coverage.

46.4CLMay 30
OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

Maksim Savkin, Mikhail Goncharov, Alexander Gambashidze et al.

Recent progress in the development of language models has been defined by scale, with each generation absorbing more of the world's knowledge into its weights. However, many practical applications benefit more from robust reasoning than from extensive parametric knowledge. In this setting, task-specialized small language models (SLMs) offer a principled design choice. We introduce Optimal Cognitive Core (OCC), a family of SLMs built around this premise. As a variant of OCC, we present OCC-RAG, optimized for faithful question answering (QA) grounded in the provided context. This task directly aligns with the OCC design approach, requiring multi-hop reasoning over supplied passages while ignoring memorized knowledge. To train OCC-RAG, we implement a novel pipeline for synthesizing multi-context, multi-hop QA data at scale, producing a corpus of over three million examples targeting multi-hop reasoning, strict context faithfulness, and calibrated abstention. We release OCC-RAG-0.6B and OCC-RAG-1.7B, both mid-trained on this corpus. The models produce structured reasoning traces with source citations grounded in literal quotes from the context. Through OCC-RAG, we demonstrate that compact, task-specialized SLMs can match or exceed general-purpose models 2 -- 6x their size across multi-hop reasoning (HotpotQA, MuSiQue, TAT-QA), faithfulness (ConFiQA), and refusal (MuSiQue-Un) benchmarks.

NAJan 8, 2013
A projector-splitting integrator for dynamical low-rank approximation

Christian Lubich, Ivan Oseledets

The dynamical low-rank approximation of time-dependent matrices is a low-rank factorization updating technique. It leads to differential equations for factors of the matrices, which need to be solved numerically. We propose and analyze a fully ex- plicit, computationally inexpensive integrator that is based on splitting the orthogonal projector onto the tangent space of the low-rank manifold. As is shown by theory and illustrated by numerical experiments, the integrator enjoys robustness properties that are not shared by any standard numerical integrator. This robustness can be exploited to change the rank adaptively. Another application is in optimization algorithms for low-rank matrices where truncation back to the given low rank can be done efficiently by applying a step of the integrator proposed here.

NAJan 11, 2015
Time integration of tensor trains

Christian Lubich, Ivan Oseledets, Bart Vandereycken

A robust and efficient time integrator for dynamical tensor approximation in the tensor train or matrix product state format is presented. The method is based on splitting the projector onto the tangent space of the tensor manifold. The algorithm can be used for updating time-dependent tensors in the given data-sparse tensor train / matrix product state format and for computing an approximate solution to high-dimensional tensor differential equations within this data-sparse format. The formulation, implementation and theoretical properties of the proposed integrator are studied, and numerical experiments with problems from quantum molecular dynamics and with iterative processes in the tensor train format are included.

LGApr 30, 2022
TTOpt: A Maximum Volume Quantized Tensor Train-based Optimization and its Application to Reinforcement Learning

Konstantin Sozykin, Andrei Chertkov, Roman Schutski et al.

We present a novel procedure for optimization based on the combination of efficient quantized tensor train representation and a generalized maximum matrix volume principle. We demonstrate the applicability of the new Tensor Train Optimizer (TTOpt) method for various tasks, ranging from minimization of multidimensional functions to reinforcement learning. Our algorithm compares favorably to popular evolutionary-based methods and outperforms them by the number of function evaluations or execution time, often by a significant margin.

LGAug 8, 2023
Quantization Aware Factorization for Deep Neural Network Compression

Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan et al.

Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it's prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.

NANov 10, 2017
Deep Multigrid: learning prolongation and restriction matrices

Alexandr Katrutsa, Talgat Daulbaev, Ivan Oseledets

This paper proposes the method to optimize restriction and prolongation operators in the two-grid method. The proposed method is straightforwardly extended to the geometric multigrid method (GMM). GMM is used in solving discretized partial differential equation (PDE) and based on the restriction and prolongation operators. The operators are crucial for fast convergence of GMM, but they are unknown. To find them we propose a reformulation of the two-grid method in terms of a deep neural network with a specific architecture. This architecture is based on the idea that every operation in the two-grid method can be considered as a layer of a deep neural network. The parameters of layers correspond to the restriction and prolongation operators. Therefore, we state an optimization problem with respect to these operators and get optimal ones through backpropagation approach. To illustrate the performance of the proposed approach, we carry out experiments on the discretized Laplace equation, Helmholtz equation and singularly perturbed convection-diffusion equation and demonstrate that proposed approach gives operators, which lead to faster convergence.

QUANT-PHJul 6, 2022
Tensor networks in machine learning

Richik Sengupta, Soumik Adhikary, Ivan Oseledets et al.

A tensor network is a type of decomposition used to express and approximate large arrays of data. A given data-set, quantum state or higher dimensional multi-linear map is factored and approximated by a composition of smaller multi-linear maps. This is reminiscent to how a Boolean function might be decomposed into a gate array: this represents a special case of tensor decomposition, in which the tensor entries are replaced by 0, 1 and the factorisation becomes exact. The collection of associated techniques are called, tensor network methods: the subject developed independently in several distinct fields of study, which have more recently become interrelated through the language of tensor networks. The tantamount questions in the field relate to expressability of tensor networks and the reduction of computational overheads. A merger of tensor networks with machine learning is natural. On the one hand, machine learning can aid in determining a factorization of a tensor network approximating a data set. On the other hand, a given tensor network structure can be viewed as a machine learning model. Herein the tensor network parameters are adjusted to learn or classify a data-set. In this survey we recover the basics of tensor networks and explain the ongoing effort to develop the theory of tensor networks in machine learning.

CVAug 2, 2022
T4DT: Tensorizing Time for Learning Temporal 3D Visual Data

Mikhail Usvyatsov, Rafael Ballester-Rippoll, Lina Bashaeva et al.

Unlike 2D raster images, there is no single dominant representation for 3D visual data processing. Different formats like point clouds, meshes, or implicit functions each have their strengths and weaknesses. Still, grid representations such as signed distance functions have attractive properties also in 3D. In particular, they offer constant-time random access and are eminently suitable for modern machine learning. Unfortunately, the storage size of a grid grows exponentially with its dimension. Hence they often exceed memory limits even at moderate resolution. This work proposes using low-rank tensor formats, including the Tucker, tensor train, and quantics tensor train decompositions, to compress time-varying 3D data. Our method iteratively computes, voxelizes, and compresses each frame's truncated signed distance function and applies tensor rank truncation to condense all frames into a single, compressed tensor that represents the entire 4D scene. We show that low-rank tensor compression is extremely compact to store and query time-varying signed distance functions. It significantly reduces the memory footprint of 4D scenes while remarkably preserving their geometric quality. Unlike existing, iterative learning-based approaches like DeepSDF and NeRF, our method uses a closed-form algorithm with theoretical guarantees.

NAJan 11, 2019
Alternating least squares as moving subspace correction

Ivan Oseledets, Maxim Rakhuba, André Uschmajew

In this note we take a new look at the local convergence of alternating optimization methods for low-rank matrices and tensors. Our abstract interpretation as sequential optimization on moving subspaces yields insightful reformulations of some known convergence conditions that focus on the interplay between the contractivity of classical multiplicative Schwarz methods with overlapping subspaces and the curvature of low-rank matrix and tensor manifolds. While the verification of the abstract conditions in concrete scenarios remains open in most cases, we are able to provide an alternative and conceptually simple derivation of the asymptotic convergence rate of the two-sided block power method of numerical algebra for computing the dominant singular subspaces of a rectangular matrix. This method is equivalent to an alternating least squares method applied to a distance function. The theoretical results are illustrated and validated by numerical experiments.

NANov 27, 2018
Low-rank Riemannian eigensolver for high-dimensional Hamiltonians

Maxim Rakhuba, Alexander Novikov, Ivan Oseledets

Such problems as computation of spectra of spin chains and vibrational spectra of molecules can be written as high-dimensional eigenvalue problems, i.e., when the eigenvector can be naturally represented as a multidimensional tensor. Tensor methods have proven to be an efficient tool for the approximation of solutions of high-dimensional eigenvalue problems, however, their performance deteriorates quickly when the number of eigenstates to be computed increases. We address this issue by designing a new algorithm motivated by the ideas of Riemannian optimization (optimization on smooth manifolds) for the approximation of multiple eigenstates in the tensor-train format, which is also known as matrix product state representation. The proposed algorithm is implemented in TensorFlow, which allows for both CPU and GPU parallelization.

NAOct 3, 2017
Desingularization of bounded-rank matrix sets

Valentin Khrulkov, Ivan Oseledets

Conventional ways to solve optimization problems on low-rank matrix sets which appear in great number of applications ignore its underlying structure of an algebraic variety and existence of singular points. This leads to appearance of inverses of singular values in algorithms and since they could be close to $0$ it causes certain problems. We tackle this problem by utilizing ideas from the algebraic geometry and show how to desingularize these sets. Our main result is algorithm which uses only bounded functions of singular values and hence does not suffer from the issue described above.

NAMar 27, 2017
Jacobi-Davidson method on low-rank matrix manifolds

Maxim Rakhuba, Ivan Oseledets

In this work we generalize the Jacobi-Davidson method to the case when eigenvector can be reshaped into a low-rank matrix. In this setting the proposed method inherits advantages of the original Jacobi-Davidson method, has lower complexity and requires less storage. We also introduce low-rank version of the Rayleigh quotient iteration which naturally arises in the Jacobi-Davidson method.

ROMar 14, 2023
Multiparticle Kalman filter for object localization in symmetric environments

Roman Korkin, Ivan Oseledets, Aleksandr Katrutsa

This study considers the object localization problem and proposes a novel multiparticle Kalman filter to solve it in complex and symmetric environments. Two well-known classes of filtering algorithms to solve the localization problem are Kalman filter-based methods and particle filter-based methods. We consider these classes, demonstrate their complementary properties, and propose a novel filtering algorithm that takes the best from two classes. We evaluate the multiparticle Kalman filter in symmetric and noisy environments. Such environments are especially challenging for both classes of classical methods. We compare the proposed approach with the particle filter since only this method is feasible if the initial state is unknown. In the considered challenging environments, our method outperforms the particle filter in terms of both localization error and runtime.

IRMay 9, 2022
Are Quantum Computers Practical Yet? A Case for Feature Selection in Recommender Systems using Tensor Networks

Artyom Nikitin, Andrei Chertkov, Rafael Ballester-Ripoll et al.

Collaborative filtering models generally perform better than content-based filtering models and do not require careful feature engineering. However, in the cold-start scenario collaborative information may be scarce or even unavailable, whereas the content information may be abundant, but also noisy and expensive to acquire. Thus, selection of particular features that improve cold-start recommendations becomes an important and non-trivial task. In the recent approach by Nembrini et al., the feature selection is driven by the correlational compatibility between collaborative and content-based models. The problem is formulated as a Quadratic Unconstrained Binary Optimization (QUBO) which, due to its NP-hard complexity, is solved using Quantum Annealing on a quantum computer provided by D-Wave. Inspired by the reported results, we contend the idea that current quantum annealers are superior for this problem and instead focus on classical algorithms. In particular, we tackle QUBO via TTOpt, a recently proposed black-box optimizer based on tensor networks and multilinear algebra. We show the computational feasibility of this method for large problems with thousands of features, and empirically demonstrate that the solutions found are comparable to the ones obtained with D-Wave across all examined datasets.

AIJun 5, 2023
Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Viktoriia Chekalina, Georgii Novikov, Julia Gusak et al.

Large-scale transformer models have shown remarkable performance in language modelling tasks. However, such models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch. To reduce the number of the parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Tensor Train Matrix~(TTM) structure. Finally, we customize forward and backward operations through the TTM-based layer for simplicity and the stableness of further training. % The resulting GPT-2-based model stores up to 40% fewer parameters, showing the perplexity comparable to the original model. On the downstream tasks, including language understanding and text summarization, the model performs similarly to the original GPT-2 model. The proposed tensorized layers could be used to efficiently pre-training other Transformer models.

LGSep 29, 2022
A case study of spatiotemporal forecasting techniques for weather forecasting

Shakir Showkat Sofi, Ivan Oseledets

The majority of real-world processes are spatiotemporal, and the data generated by them exhibits both spatial and temporal evolution. Weather is one of the most essential processes in this domain, and weather forecasting has become a crucial part of our daily routine. Weather data analysis is considered the most complex and challenging task. Although numerical weather prediction models are currently state-of-the-art, they are resource-intensive and time-consuming. Numerous studies have proposed time series-based models as a viable alternative to numerical forecasts. Recent research in the area of time series analysis indicates significant advancements, particularly regarding the use of state-space-based models (white box) and, more recently, the integration of machine learning and deep neural network-based models (black box). The most famous examples of such models are RNNs and transformers. These models have demonstrated remarkable results in the field of time-series analysis and have demonstrated effectiveness in modelling temporal correlations. It is crucial to capture both temporal and spatial correlations for a spatiotemporal process, as the values at nearby locations and time affect the values of a spatiotemporal process at a specific point. This self-contained paper explores various regional data-driven weather forecasting methods, i.e., forecasting over multiple latitude-longitude points (matrix-shaped spatial grid) to capture spatiotemporal correlations. The results showed that spatiotemporal prediction models reduced computational costs while improving accuracy. In particular, the proposed tensor train dynamic mode decomposition-based forecasting model has comparable accuracy to the state-of-the-art models without the need for training. We provide convincing numerical experiments to show that the proposed approach is practical.

CLMar 20, 2023
Translate your gibberish: black-box adversarial attack on machine translation systems

Andrei Chertkov, Olga Tsymboi, Mikhail Pautov et al.

Neural networks are deployed widely in natural language processing tasks on the industrial scale, and perhaps the most often they are used as compounds of automatic machine translation systems. In this work, we present a simple approach to fool state-of-the-art machine translation tools in the task of translation from Russian to English and vice versa. Using a novel black-box gradient-free tensor-based optimizer, we show that many online translation tools, such as Google, DeepL, and Yandex, may both produce wrong or offensive translations for nonsensical adversarial input queries and refuse to translate seemingly benign input phrases. This vulnerability may interfere with understanding a new language and simply worsen the user's experience while using machine translation systems, and, hence, additional improvements of these tools are required to establish better translation.

LGDec 12, 2022
Tensor-based Sequential Learning via Hankel Matrix Representation for Next Item Recommendations

Evgeny Frolov, Ivan Oseledets

Self-attentive transformer models have recently been shown to solve the next item recommendation task very efficiently. The learned attention weights capture sequential dynamics in user behavior and generalize well. Motivated by the special structure of learned parameter space, we question if it is possible to mimic it with an alternative and more lightweight approach. We develop a new tensor factorization-based model that ingrains the structural knowledge about sequential data within the learning process. We demonstrate how certain properties of a self-attention network can be reproduced with our approach based on special Hankel matrix representation. The resulting model has a shallow linear architecture and compares competitively to its neural counterpart.

OCMar 9, 2015
Toward Fast Topological-Shape Optimization With Boundary Elements

Igor Ostanin, Denis Zorin, Ivan Oseledets

Wide variety of engineering design tasks can be formulated as constrained optimization problems where the shape and topology of the domain are optimized to reduce costs while satisfying certain constraints. Several mathematical approaches were developed to address the problem of finding optimal design of an engineered structure. Recent works have demonstrated the feasibility of boundary element method as a tool for topological-shape optimization. However, it was noted that the approach has certain drawbacks, and in particular high computational cost of the iterative optimization process. In this short note we suggest ways to address critical limitations of boundary element method as a tool for topological-shape optimization. We validate our approaches by supplementing the existing complex variables boundary element code for elastostatic problems with robust tools for fast topological-shape optimization. The efficiency of the approach is illustrated with a numerical example.

NAMar 31, 2018
Robust topology optimization using a posteriori error estimator for the finite element method

Vladislav Pimanov, Ivan Oseledets

In our work, we consider the classical density-based approach to the topology optimization. We propose to modify the discretized cost functional using a posteriori error estimator for the finite element method. It can be regarded as a new technique to prevent checkerboards. It also provides higher regularity of solutions and robustness of results.

NAApr 6, 2016
Convergence analysis of projected fixed-point iteration on a low-rank matrix manifold

Denis Kolesnikov, Ivan Oseledets

In this paper we analyse convergence of projected fixed-point iteration on a Riemannian manifold of matrices with fixed rank. As a retraction method we use `projector splitting scheme'. We prove that the projector splitting scheme converges at least with the same rate as standard fixed-point iteration without rank constraints. We also provide counter-example to the case when conditions of the theorem do not hold. Finally we support our theoretical results with numerical experiments.

OCSep 29, 2022
NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

Valentin Leplat, Daniil Merkulov, Aleksandr Katrutsa et al.

Classical machine learning models such as deep neural networks are usually trained by using Stochastic Gradient Descent-based (SGD) algorithms. The classical SGD can be interpreted as a discretization of the stochastic gradient flow. In this paper we propose a novel, robust and accelerated stochastic optimizer that relies on two key elements: (1) an accelerated Nesterov-like Stochastic Differential Equation (SDE) and (2) its semi-implicit Gauss-Seidel type discretization. The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively in the case of the minimization of a quadratic function. This analysis allows us to come up with an optimal learning rate in terms of the convergence rate while ensuring the stability of NAG-GS. This is achieved by the careful analysis of the spectral radius of the iteration matrix and the covariance matrix at stationarity with respect to all hyperparameters of our method. Further, we show that NAG- GS is competitive with state-of-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models such as the logistic regression model, the residual networks models on standard computer vision datasets, Transformers in the frame of the GLUE benchmark and the recent Vision Transformers.

67.1CVApr 6Code
OrthoFuse: Training-free Riemannian Fusion of Orthogonal Style-Concept Adapters for Diffusion Models

Ali Aliev, Kamil Garifullin, Nikolay Yudin et al.

In a rapidly growing field of model training there is a constant practical interest in parameter-efficient fine-tuning and various techniques that use a small amount of training data to adapt the model to a narrow task. However, there is an open question: how to combine several adapters tuned for different tasks into one which is able to yield adequate results on both tasks? Specifically, merging subject and style adapters for generative models remains unresolved. In this paper we seek to show that in the case of orthogonal fine-tuning (OFT), we can use structured orthogonal parametrization and its geometric properties to get the formulas for training-free adapter merging. In particular, we derive the structure of the manifold formed by the recently proposed Group-and-Shuffle ($\mathcal{GS}$) orthogonal matrices, and obtain efficient formulas for the geodesics approximation between two points. Additionally, we propose a $\text{spectra restoration}$ transform that restores spectral properties of the merged adapter for higher-quality fusion. We conduct experiments in subject-driven generation tasks showing that our technique to merge two $\mathcal{GS}$ orthogonal matrices is capable of uniting concept and style features of different adapters. To the best of our knowledge, this is the first training-free method for merging multiplicative orthogonal adapters. Code is available via the $\href{https://github.com/ControlGenAI/OrthoFuse}{link}$.

CVSep 16, 2024Code
Anatomical Positional Embeddings

Mikhail Goncharov, Valentin Samokhin, Eugenia Soboleva et al.

We propose a self-supervised model producing 3D anatomical positional embeddings (APE) of individual medical image voxels. APE encodes voxels' anatomical closeness, i.e., voxels of the same organ or nearby organs always have closer positional embeddings than the voxels of more distant body parts. In contrast to the existing models of anatomical positional embeddings, our method is able to efficiently produce a map of voxel-wise embeddings for a whole volumetric input image, which makes it an optimal choice for different downstream applications. We train our APE model on 8400 publicly available CT images of abdomen and chest regions. We demonstrate its superior performance compared with the existing models on anatomical landmark retrieval and weakly-supervised few-shot localization of 13 abdominal organs. As a practical application, we show how to cheaply train APE to crop raw CT images to different anatomical regions of interest with 0.99 recall, while reducing the image volume by 10-100 times. The code and the pre-trained APE model are available at https://github.com/mishgon/ape .

NAJun 1, 2019
How to optimize preconditioners for the conjugate gradient method: a stochastic approach

Alexandr Katrutsa, Mike Botchev, George Ovchinnikov et al.

The conjugate gradient method (CG) is typically used with a preconditioner which improves efficiency and robustness of the method. Many preconditioners include parameters and a proper choice of a preconditioner and its parameters is often not a trivial task. Although many convergence estimates exist which can be used for optimizing preconditioners, these estimates typically hold for all initial guess vectors, in other words, they reflect the worst convergence rate. To account for the mean convergence rate instead, in this paper, we follow a stochastic approach. It is based on trial runs with random initial guess vectors and leads to a functional which can be used to monitor convergence and to optimize preconditioner parameters in CG. Presented numerical experiments show that optimization of this new functional with respect to preconditioner parameters usually yields a better parameter value than optimization of the functional based on the spectral condition number.

NAFeb 5, 2018
Function approximation using gradient information with application to parametric and stochastic differential equations

Gleb Ryzhakov, Ivan Oseledets

In the paper we consider the problem of multivariate function approximation in polynomial basis. In order to solve this problem, we adjust the least squares method (LSM) by adding information about derivatives of the function. This modification allows reducing the number of evaluations of approximating function while keeping the accuracy at the appropriate level. We propose several techniques for time-efficient calculation of derivatives in various applications. Numerical examples are given for comparison between the standard LSM and the proposed approach.

COMP-PHMay 1, 2017
Time- and memory-efficient representation of complex mesoscale potentials

Grigory Drozdov, Igor Ostanin, Ivan Oseledets

We apply the modern technique of approximation of multivariate functions - tensor train cross approximation - to the problem of the description of physical interactions between complex-shaped bodies in a context of computational nanomechanics. In this note we showcase one particular example - van der Waals interactions between two cylindrical bodies - relevant to modeling of carbon nanotube systems. The potential is viewed as a tensor (multidimensional table) which is represented in compact form with the help of tensor train decomposition. The described approach offers a universal solution for the description of van der Waals interactions between complex-shaped nanostructures and can be used within the framework of such systems of mesoscale modeling as recently emerged mesoscopic distinct element method (MDEM).

38.0AIMay 28
Harnessing non-adversarial robustness in large language models

Qinghua Zhou, Ellina Aleshina, Andrey Lovyagin et al.

The work presents an approach for addressing the challenge of robustness in Large Language Models (LLMs) to alterations and potential errors caused by semantically similar but textually different prompts. Recent works have shown that these kinds of prompt variations can significantly impact the performance of LLMs on tasks. The central question is: can LLMs' robustness to semantically-neutral prompt alterations be acquired without expensive retraining of the entire model? We address this question both theoretically and through experiments. Our theoretical analysis reveals a crucial factor impacting model robustness - a systematic expected shift or perturbation-induced bias in neural network module outputs. Motivated by this analysis, we show that robustness can be achieved via a simple fine-tuning process: debiasing for robustness. We identify conditions when debiasing helps and when it does not, and demonstrate, through both theory and extensive experiments, that debiasing for robustness may indeed be a quick and efficient tool to enhance robustness and provide certification against random prompt perturbations.

IRMay 10, 2022
Tensor-based Collaborative Filtering With Smooth Ratings Scale

Nikita Marin, Elizaveta Makhneva, Maria Lysyuk et al.

Conventional collaborative filtering techniques don't take into consideration the effect of discrepancy in users' rating perception. Some users may rarely give 5 stars to items while others almost always assign 5 stars to the chosen item. Even if they had experience with the same items this systematic discrepancy in their evaluation style will lead to the systematic errors in the ability of recommender system to effectively extract right patterns from data. To mitigate this problem we introduce the ratings' similarity matrix which represents the dependency between different values of ratings on the population level. Hence, if on average the correlations between ratings exist, it is possible to improve the quality of proposed recommendations by off-setting the effect of either shifted down or shifted up users' rates.

NAApr 28, 2018
Simple non-extensive sparsification of the hierarchical matrices

Daria Sushnikova, Ivan Oseledets

In this paper, we consider the matrices approximated in H2 format. The direct solution, as well as the preconditioning, of systems with such matrices is a challenging problem. We propose a non-extensive sparse factorization of the H2 matrix that allows to substitute the direct H2 solution with the solution of the system with an equivalent sparse matrix of the same size. The sparse factorization is constructed out of parameters of the H2 matrix. In the numerical experiments, we show the consistency of this approach in comparison to the other approximate block low-rank hierarchical solvers, such as HODLR, H2Lib and IFMM.

NAApr 5, 2017
Vico-Greengard-Ferrando quadratures in the tensor solver for integral equations

Valentin Khrulkov, Maxim Rakhuba, Ivan Oseledets

Convolution with Green's function of a differential operator appears in a lot of applications e.g. Lippmann-Schwinger integral equation. Algorithms for computing such are usually non-trivial and require non-uniform mesh. However, recently Vico, Greengard and Ferrando developed method for computing convolution with smooth functions with compact support with spectral accuracy, requiring nothing more than Fast Fourier Transform (FFT). Their approach is very suitable for the low-rank tensor implementation which we develop using Quantized Tensor Train (QTT) decomposition.

IRFeb 5, 2023
Federated Privacy-preserving Collaborative Filtering for On-Device Next App Prediction

Albert Sayapin, Gleb Balitskiy, Daniel Bershatsky et al.

In this study, we propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage. Although this problem can be represented as a classical collaborative filtering problem, it requires proper modification since the data are sequential, the user feedback is distributed among devices and the transmission of users' data to aggregate common patterns must be protected against leakage. According to such requirements, we modify the structure of the classical matrix factorization model and update the training procedure to sequential learning. Since the data about user experience are distributed among devices, the federated learning setup is used to train the proposed sequential matrix factorization model. One more ingredient of the proposed approach is a new privacy mechanism that guarantees the protection of the sent data from the users to the remote server. To demonstrate the efficiency of the proposed model we use publicly available mobile user behavior data. We compare our model with sequential rules and models based on the frequency of app launches. The comparison is conducted in static and dynamic environments. The static environment evaluates how our model processes sequential data compared to competitors. Therefore, the standard train-validation-test evaluation procedure is used. The dynamic environment emulates the real-world scenario, where users generate new data by running apps on devices, and evaluates our model in this case. Our experiments show that the proposed model provides comparable quality with other methods in the static environment. However, more importantly, our method achieves a better privacy-utility trade-off than competitors in the dynamic environment, which provides more accurate simulations of real-world usage.

FLU-DYNJan 12, 2023
Machine learning methods for prediction of breakthrough curves in reactive porous media

Daria Fokina, Pavel Toktaliev, Oleg Iliev et al.

Reactive flows in porous media play an important role in our life and are crucial for many industrial, environmental and biomedical applications. Very often the concentration of the species at the inlet is known, and the so-called breakthrough curves, measured at the outlet, are the quantities which could be measured or computed numerically. The measurements and the simulations could be time-consuming and expensive, and machine learning and Big Data approaches can help to predict breakthrough curves at lower costs. Machine learning (ML) methods, such as Gaussian processes and fully-connected neural networks, and a tensor method, cross approximation, are well suited for predicting breakthrough curves. In this paper, we demonstrate their performance in the case of pore scale reactive flow in catalytic filters.

AIJan 8, 2023
Mitigating Human and Computer Opinion Fraud via Contrastive Learning

Yuliya Tukmacheva, Ivan Oseledets, Evgeny Frolov

We introduce the novel approach towards fake text reviews detection in collaborative filtering recommender systems. The existing algorithms concentrate on detecting the fake reviews, generated by language models and ignore the texts, written by dishonest users, mostly for monetary gains. We propose the contrastive learning-based architecture, which utilizes the user demographic characteristics, along with the text reviews, as the additional evidence against fakes. This way, we are able to account for two different types of fake reviews spamming and make the recommendation system more robust to biased reviews.

NADec 25, 2022
FMM-Net: neural network architecture based on the Fast Multipole Method

Daria Sushnikova, Pavel Kharyuk, Ivan Oseledets

In this paper, we propose a new neural network architecture based on the H2 matrix. Even though networks with H2-inspired architecture already exist, and our approach is designed to reduce memory costs and improve performance by taking into account the sparsity template of the H2 matrix. In numerical comparison with alternative neural networks, including the known H2-based ones, our architecture showed itself as beneficial in terms of performance, memory, and scalability.

FLU-DYNApr 25, 2022
On the Performance of Machine Learning Methods for Breakthrough Curve Prediction

Daria Fokina, Oleg Iliev, Pavel Toktaliev et al.

Reactive flows are important part of numerous technical and environmental processes. Often monitoring the flow and species concentrations within the domain is not possible or is expensive, in contrast, outlet concentration is straightforward to measure. In connection with reactive flows in porous media, the term breakthrough curve is used to denote the time dependency of the outlet concentration with prescribed conditions at the inlet. In this work we apply several machine learning methods to predict breakthrough curves from the given set of parameters. In our case the parameters are the Damköhler and Peclet numbers. We perform a thorough analysis for the one-dimensional case and also provide the results for the three-dimensional case.

OCDec 13, 2016
What Lies Beneath the Surface: Topological-Shape Optimization With the Kernel-Independent Fast Multipole Method

Igor Ostanin, Ivan Tsybulin, Mikhail Litsarev et al.

The paper presents a new method for shape and topology optimization based on an efficient and scalable boundary integral formulation for elasticity. To optimize topology, our approach uses iterative extraction of isosurfaces of a topological derivative. The numerical solution of the elasticity boundary value problem at every iteration is performed with the boundary element formulation and the kernel-independent fast multipole method. Providing excellent single node performance, scalable parallelization and the best available asymptotic complexity, our method is among the fastest optimization tools available today. The performance of our approach is studied on few illustrative examples, including the optimization of engineered constructions for the minimum compliance and the optimization of the microstructure of a metamaterial for the desired macroscopic tensor of elasticity.

NAMar 5, 2019
Preconditioning Kaczmarz method by sketching

Alexandr Katrutsa, Ivan Oseledets

We propose a new method for preconditioning Kaczmarz method by sketching. Kaczmarz method is a stochastic method for solving overdetermined linear systems based on a sampling of matrix rows. The standard approach to speed up convergence of iterative methods is using preconditioner. As we show the best possible preconditioner for this method can be constructed from QR decomposition of the system matrix, but the complexity of this procedure is too high. Therefore, to reduce this complexity, we use random sketching and compare it with the Kaczmarz method without preconditioning. The developed method is applicable for different modifications of classical Kaczmarz method that were proposed recently. We provide numerical experiments to show the performance of the developed methods on solving both random and real overdetermined linear systems.

NAFeb 26, 2018
Regularization of topology optimization problem by the FEM a posteriori error estimator

Vladislav Pimanov, Ivan Oseledets

In our work, we consider the classical density-based approach to topology optimization. We propose the modification of the discretized cost/objective functional using a posteriori error estimator for the finite element method. It can be regarded as a new technique to prevent checkerboards. It also provides higher regularity of the solutions and robustness of the results.

CVAug 31, 2023
Unsupervised evaluation of GAN sample quality: Introducing the TTJac Score

Egor Sevriugov, Ivan Oseledets

Evaluation metrics are essential for assessing the performance of generative models in image synthesis. However, existing metrics often involve high memory and time consumption as they compute the distance between generated samples and real data points. In our study, the new evaluation metric called the "TTJac score" is proposed to measure the fidelity of individual synthesized images in a data-free manner. The study first establishes a theoretical approach to directly evaluate the generated sample density. Then, a method incorporating feature extractors and discrete function approximation through tensor train is introduced to effectively assess the quality of generated samples. Furthermore, the study demonstrates that this new metric can be used to improve the fidelity-variability trade-off when applying the truncation trick. The experimental results of applying the proposed metric to StyleGAN 2 and StyleGAN 2 ADA models on FFHQ, AFHQ-Wild, LSUN-Cars, and LSUN-Horse datasets are presented. The code used in this research will be made publicly available online for the research community to access and utilize.

CVAug 31, 2023
Robust GAN inversion

Egor Sevriugov, Ivan Oseledets

Recent advancements in real image editing have been attributed to the exploration of Generative Adversarial Networks (GANs) latent space. However, the main challenge of this procedure is GAN inversion, which aims to map the image to the latent space accurately. Existing methods that work on extended latent space $W+$ are unable to achieve low distortion and high editability simultaneously. To address this issue, we propose an approach which works in native latent space $W$ and tunes the generator network to restore missing image details. We introduce a novel regularization strategy with learnable coefficients obtained by training randomized StyleGAN 2 model - WRanGAN. This method outperforms traditional approaches in terms of reconstruction quality and computational efficiency, achieving the lowest distortion with 4 times fewer parameters. Furthermore, we observe a slight improvement in the quality of constructing hyperplanes corresponding to binary image attributes. We demonstrate the effectiveness of our approach on two complex datasets: Flickr-Faces-HQ and LSUN Church.

CVAug 17, 2023
General Lipschitz: Certified Robustness Against Resolvable Semantic Transformations via Transformation-Dependent Randomized Smoothing

Dmitrii Korzh, Mikhail Pautov, Olga Tsymboi et al.

Randomized smoothing is the state-of-the-art approach to construct image classifiers that are provably robust against additive adversarial perturbations of bounded magnitude. However, it is more complicated to construct reasonable certificates against semantic transformation (e.g., image blurring, translation, gamma correction) and their compositions. In this work, we propose \emph{General Lipschitz (GL),} a new framework to certify neural networks against composable resolvable semantic perturbations. Within the framework, we analyze transformation-dependent Lipschitz-continuity of smoothed classifiers w.r.t. transformation parameters and derive corresponding robustness certificates. Our method performs comparably to state-of-the-art approaches on the ImageNet dataset.

63.1LGMar 25Code
Marchuk: Efficient Global Weather Forecasting from Mid-Range to Sub-Seasonal Scales via Flow Matching

Arsen Kuzhamuratov, Mikhail Zhirnov, Andrey Kuznetsov et al.

Accurate subseasonal weather forecasting remains a major challenge due to the inherently chaotic nature of the atmosphere, which limits the predictive skill of conventional models beyond the mid-range horizon (approximately 15 days). In this work, we present \textit{Marchuk}, a generative latent flow-matching model for global weather forecasting spanning mid-range to subseasonal timescales, with prediction horizons of up to 30 days. Marchuk conditions on current-day weather maps and autoregressively predicts subsequent days' weather maps within the learned latent space. We replace rotary positional encodings (RoPE) with trainable positional embeddings and extend the temporal context window, which together enhance the model's ability to represent and propagate long-range temporal dependencies during latent forecasting. Marchuk offers two key advantages: high computational efficiency and strong predictive performance. Despite its compact architecture of only 276 million parameters, the model achieves performance comparable to LaDCast, a substantially larger model with 1.6 billion parameters, while operating at significantly higher inference speeds. We open-source our inference code and model at: https://v-gen-ai.github.io/Marchuk/

CLNov 10, 2023
The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova et al.

In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. Which is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties.

IRSep 27, 2024
Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs

Gleb Mezentsev, Danil Gusak, Ivan Oseledets et al.

Scalability issue plays a crucial role in productionizing modern recommender systems. Even lightweight architectures may suffer from high computational overload due to intermediate calculations, limiting their practicality in real-world applications. Specifically, applying full Cross-Entropy (CE) loss often yields state-of-the-art performance in terms of recommendations quality. Still, it suffers from excessive GPU memory utilization when dealing with large item catalogs. This paper introduces a novel Scalable Cross-Entropy (SCE) loss function in the sequential learning setup. It approximates the CE loss for datasets with large-size catalogs, enhancing both time efficiency and memory usage without compromising recommendations quality. Unlike traditional negative sampling methods, our approach utilizes a selective GPU-efficient computation strategy, focusing on the most informative elements of the catalog, particularly those most likely to be false positives. This is achieved by approximating the softmax distribution over a subset of the model outputs through the maximum inner product search. Experimental results on multiple datasets demonstrate the effectiveness of SCE in reducing peak memory usage by a factor of up to 100 compared to the alternatives, retaining or even exceeding their metrics values. The proposed approach also opens new perspectives for large-scale developments in different domains, such as large language models.

LGOct 5, 2024
Sinc Kolmogorov-Arnold Network and Its Applications on Physics-informed Neural Networks

Tianchi Yu, Jingwei Qiu, Jiang Yang et al.

In this paper, we propose to use Sinc interpolation in the context of Kolmogorov-Arnold Networks, neural networks with learnable activation functions, which recently gained attention as alternatives to multilayer perceptron. Many different function representations have already been tried, but we show that Sinc interpolation proposes a viable alternative, since it is known in numerical analysis to represent well both smooth functions and functions with singularities. This is important not only for function approximation but also for the solutions of partial differential equations with physics-informed neural networks. Through a series of experiments, we show that SincKANs provide better results in almost all of the examples we have considered.

83.8LGMay 22
TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models

Arseny Ivanov, Sergei Kholkin, Vladislav Gromadskii et al.

Log-likelihood is a standard metric for evaluating generative models. Unfortunately, in contrast to autoregressive models (ARMs), discrete diffusion models generally do not admit exact computation of this quantity. Existing evaluations, therefore, rely on the evidence lower bound (ELBO), leaving unclear how much higher the true value may be. We address this by introducing the Tangent Upper Bound on Evidence (TUBE), a variational upper bound on log-likelihood that admits an unbiased Monte Carlo estimator. Our TUBE extends across latent-variable models, including masked diffusion models (MDMs), any-order ARMs (AO-ARMs), and block variants of both. Applied to block MDMs and block AO-ARMs, TUBE reveals our key empirical finding that these models lie strictly below the exact ARM baseline, showing that ARMs still dominate in likelihood.