Tatyana Matveeva

h-index3
2papers

2 Papers

IRFeb 26
Cross-Representation Knowledge Transfer for Improved Sequential Recommendations

Artur Gimranov, Viacheslav Yusupov, Elfat Sabitov et al.

Transformer architectures, capable of capturing sequential dependencies in the history of user interactions, have become the dominant approach in sequential recommender systems. Despite their success, such models consider sequence elements in isolation, implicitly accounting for the complex relationships between them. Graph neural networks, in contrast, explicitly model these relationships through higher order interactions but are often unable to adequately capture their evolution over time, limiting their use for predicting the next interaction. To fill this gap, we present a new framework that combines transformers and graph neural networks and aligns different representations for solving next-item prediction task. Our solution simultaneously encodes structural dependencies in the interaction graph and tracks their dynamic change. Experimental results on a number of open datasets demonstrate that the proposed framework consistently outperforms both pure sequential and graph approaches in terms of recommendation quality, as well as recent methods that combine both types of signals.

LGAug 28, 2025
Dynamic Low-rank Approximation of Full-Matrix Preconditioner for Training Generalized Linear Models

Tatyana Matveeva, Aleksandr Katrutsa, Evgeny Frolov

Adaptive gradient methods like Adagrad and its variants are widespread in large-scale optimization. However, their use of diagonal preconditioning matrices limits the ability to capture parameter correlations. Full-matrix adaptive methods, approximating the exact Hessian, can model these correlations and may enable faster convergence. At the same time, their computational and memory costs are often prohibitive for large-scale models. To address this limitation, we propose AdaGram, an optimizer that enables efficient full-matrix adaptive gradient updates. To reduce memory and computational overhead, we utilize fast symmetric factorization for computing the preconditioned update direction at each iteration. Additionally, we maintain the low-rank structure of a preconditioner along the optimization trajectory using matrix integrator methods. Numerical experiments on standard machine learning tasks show that AdaGram converges faster or matches the performance of diagonal adaptive optimizers when using rank five and smaller rank approximations. This demonstrates AdaGram's potential as a scalable solution for adaptive optimization in large models.