IRMay 20
Faster and Memory-Efficient Training of Sequential Recommendation Models for Large CatalogsMaxim Zhelnin, Dmitry Redko, Daniil Volkov et al.
Sequential recommendations (SR) with transformer-based architectures are widely adopted in real-world applications, where SR models require frequent retraining to adapt to ever-changing user preferences. However, training transformer-based SR models often encounters a high computational cost associated with scoring extensive item catalogs, often exceeding thousands of items. This occurs mainly due to the use of cross-entropy loss, where peak memory scales proportionally to catalog size, batch size, and sequence length. Recognizing this, practitioners in the field of recommendation systems typically address memory consumption by integrating the cross-entropy (CE) loss with negative sampling, thereby reducing the explicit memory demands of the final layer. However, a small number of negative samples would degrade model performance, and as we demonstrate in our work, increasing the number of negative samples and the batch size further improves the model's performance, but rapidly starts to exceed industrial GPUs' size (~40Gb). In this work, we introduce the CCE- method, which offers a GPU-efficient implementation of the CE loss with negative sampling. Our method accelerates training by up to two times while reducing memory consumption by more than 10 times. Leveraging the memory savings afforded by using CCE- for model training, it becomes feasible to enhance its accuracy on datasets with a large item catalog compared to those trained with original PyTorch-implemented loss functions. Finally, we perform an analysis of key memory-related hyperparameters and highlight the necessity of a delicate balance among these factors. We demonstrate that scaling both the number of negative samples and batch size leads to better results rather than maximizing only one of them. To facilitate further adoption of CCE-, we release a Triton kernel that efficiently implements the proposed method.
PMAug 15, 2023
Portfolio Selection via Topological Data AnalysisPetr Sokerin, Kristian Kuznetsov, Elizaveta Makhneva et al.
Portfolio management is an essential part of investment decision-making. However, traditional methods often fail to deliver reasonable performance. This problem stems from the inability of these methods to account for the unique characteristics of multivariate time series data from stock markets. We present a two-stage method for constructing an investment portfolio of common stocks. The method involves the generation of time series representations followed by their subsequent clustering. Our approach utilizes features based on Topological Data Analysis (TDA) for the generation of representations, allowing us to elucidate the topological structure within the data. Experimental results show that our proposed system outperforms other methods. This superior performance is consistent over different time frames, suggesting the viability of TDA as a powerful tool for portfolio selection.
LGFeb 28, 2025
Concealed Adversarial attacks on neural networks for sequential dataPetr Sokerin, Dmitry Anikin, Sofia Krehova et al.
The emergence of deep learning led to the broad usage of neural networks in the time series domain for various applications, including finance and medicine. While powerful, these models are prone to adversarial attacks: a benign targeted perturbation of input data leads to significant changes in a classifier's output. However, formally small attacks in the time series domain become easily detected by the human eye or a simple detector model. We develop a concealed adversarial attack for different time-series models: it provides more realistic perturbations, being hard to detect by a human or model discriminator. To achieve this goal, the proposed adversarial attack maximizes an aggregation of a classifier and a trained discriminator loss. To make the attack stronger, we also propose a training procedure for a discriminator that provides broader coverage of possible attacks. Extensive benchmarking on six UCR time series datasets across four diverse architectures - including recurrent, convolutional, state-space, and transformer-based models - demonstrates the superiority of our attack for a concealability-efficiency trade-off. Our findings highlight the growing challenge of designing robust time series models, emphasizing the need for improved defenses against realistic and effective attacks.
LGFeb 14, 2025
Looking around you: external information enhances representations for event sequencesMaria Kovaleva, Petr Sokerin, Pavel Tikhomirov et al.
Representation learning produces models in different domains, such as store purchases, client transactions, and general people's behaviour. However, such models for event sequences usually process each sequence in isolation, ignoring context from ones that co-occur in time. This limitation is particularly problematic in domains with fast-evolving conditions, like finance and e-commerce, or when certain sequences lack recent events. We develop a method that aggregates information from multiple user representations, augmenting a specific user for a scenario of multiple co-occurring event sequences, achieving better quality than processing each sequence independently. Our study considers diverse aggregation approaches, ranging from simple pooling techniques to trainable attention-based Kernel attention aggregation, that can highlight more complex information flow from other users. The proposed methods operate on top of an existing encoder and support its efficient fine-tuning. Across six diverse event sequence datasets (finance, e-commerce, education, etc.) and downstream tasks, Kernel attention improves ROC-AUC scores, both with and without fine-tuning, while mean pooling yields a smaller but still significant gain.