Ivan Karpukhin

LG
h-index3
11papers
48citations
Novelty50%
AI Score44

11 Papers

LGSep 26, 2024Code
Multimodal Banking Dataset: Understanding Client Needs through Event Sequences

Dzhambulat Mollaev, Alexander Kostin, Maria Postnova et al.

Financial organizations collect a huge amount of temporal (sequential) data about clients, which is typically collected from multiple sources (modalities). Despite the urgent practical need, developing deep learning techniques suitable to handle such data is limited by the absence of large open-source multi-source real-world datasets of event sequences. To fill this gap, which is mainly caused by security reasons, we present the first industrial-scale publicly available multimodal banking dataset, MBD, that contains information on more than 2M corporate clients of a large bank. Clients are represented by several data sources: 950M bank transactions, 1B geo position events, 5M embeddings of dialogues with technical support, and monthly aggregated purchases of four bank products. All entries are properly anonymized from real proprietary bank data, and the experiments confirm that our anonymization still saves all significant information for introduced downstream tasks. Moreover, we introduce a novel multimodal benchmark suggesting several important practical tasks, such as future purchase prediction and modality matching. The benchmark incorporates our MBD and two public financial datasets. We provide numerical results for the state-of-the-art event sequence modeling techniques including large language models and demonstrate the superiority of fusion baselines over single-modal techniques for each task. Thus, MBD provides a valuable resource for future research in financial applications of multimodal event sequence analysis. HuggingFace Link: https://huggingface.co/datasets/ai-lab/MBD Github Link: https://github.com/Dzhambo/MBD

CVMay 23, 2022
Deep Image Retrieval is not Robust to Label Noise

Stanislav Dereka, Ivan Karpukhin, Sergey Kolesnikov

Large-scale datasets are essential for the success of deep learning in image retrieval. However, manual assessment errors and semi-supervised annotation techniques can lead to label noise even in popular datasets. As previous works primarily studied annotation quality in image classification tasks, it is still unclear how label noise affects deep learning approaches to image retrieval. In this work, we show that image retrieval methods are less robust to label noise than image classification ones. Furthermore, we, for the first time, investigate different types of label noise specific to image retrieval tasks and study their effect on model performance.

LGMay 19, 2022
EXACT: How to Train Your Accuracy

Ivan Karpukhin, Stanislav Dereka, Sergey Kolesnikov

Classification tasks are usually evaluated in terms of accuracy. However, accuracy is discontinuous and cannot be directly optimized using gradient ascent. Popular methods minimize cross-entropy, hinge loss, or other surrogate losses, which can lead to suboptimal results. In this paper, we propose a new optimization framework by introducing stochasticity to a model's output and optimizing expected accuracy, i.e. accuracy of the stochastic model. Extensive experiments on linear models and deep image classification show that the proposed optimization method is a powerful alternative to widely used classification losses.

CLJul 3, 2024
ESQA: Event Sequences Question Answering

Irina Abdullaeva, Andrei Filatov, Mikhail Orlov et al.

Event sequences (ESs) arise in many practical domains including finance, retail, social networks, and healthcare. In the context of machine learning, event sequences can be seen as a special type of tabular data with annotated timestamps. Despite the importance of ESs modeling and analysis, little effort was made in adapting large language models (LLMs) to the ESs domain. In this paper, we highlight the common difficulties of ESs processing and propose a novel solution capable of solving multiple downstream tasks with little or no finetuning. In particular, we solve the problem of working with long sequences and improve time and numeric features processing. The resulting method, called ESQA, effectively utilizes the power of LLMs and, according to extensive experiments, achieves state-of-the-art results in the ESs domain.

LGAug 23, 2024
Detecting the Future: All-at-Once Event Sequence Forecasting with Horizon Matching

Ivan Karpukhin, Andrey Savchenko

Long-horizon events forecasting is a crucial task across various domains, including retail, finance, healthcare, and social networks. Traditional models for event sequences often extend to forecasting on a horizon using an autoregressive (recursive) multi-step strategy, which has limited effectiveness due to typical convergence to constant or repetitive outputs. To address this limitation, we introduce DEF, a novel approach for simultaneous forecasting of multiple future events on a horizon with high accuracy and diversity. Our method optimally aligns predictions with ground truth events during training by using a novel matching-based loss function. We establish a new state-of-the-art in long-horizon event prediction, achieving up to a 50% relative improvement over existing temporal point processes and event prediction models. Furthermore, we achieve state-of-the-art performance in next-event prediction tasks while demonstrating high computational efficiency during inference.

26.5LGMay 8
When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining

Ivan Karpukhin, Andrey Savchenko

Modern deep models are often pretrained on large-scale data with missing labels using composite objectives, where the relative weights of multiple loss terms act as hyperparameters. Tuning these weights with random search or Bayesian optimization is computationally expensive, as it requires many independent training runs. To address this, we propose a gradient-based bilevel method that learns pretraining loss weights online by aligning the composite pretraining gradient with a downstream objective. By exploiting the structure of the loss, the method avoids the multiple backward passes typically required by truncated backpropagation through the full model, reducing the overhead of hyperparameter tuning to approximately 30% above a single training run. We evaluate the approach on event-sequence modeling and self-supervised computer vision, where it matches or improves upon carefully tuned baselines while substantially reducing the cost of hyperparameter tuning compared to random or Bayesian search.

LGJun 23, 2023
Catching Image Retrieval Generalization

Maksim Zhdanov, Ivan Karpukhin

The concepts of overfitting and generalization are vital for evaluating machine learning models. In this work, we show that the popular Recall@K metric depends on the number of classes in the dataset, which limits its ability to estimate generalization. To fix this issue, we propose a new metric, which measures retrieval performance, and, unlike Recall@K, estimates generalization. We apply the proposed metric to popular image retrieval methods and provide new insights about deep metric learning generalization.

LGAug 2, 2025
HT-Transformer: Event Sequences Classification by Accumulating Prefix Information with History Tokens

Ivan Karpukhin, Andrey Savchenko

Deep learning has achieved remarkable success in modeling sequential data, including event sequences, temporal point processes, and irregular time series. Recently, transformers have largely replaced recurrent networks in these tasks. However, transformers often underperform RNNs in classification tasks where the objective is to predict future targets. The reason behind this performance gap remains largely unexplored. In this paper, we identify a key limitation of transformers: the absence of a single state vector that provides a compact and effective representation of the entire sequence. Additionally, we show that contrastive pretraining of embedding vectors fails to capture local context, which is crucial for accurate prediction. To address these challenges, we introduce history tokens, a novel concept that facilitates the accumulation of historical information during next-token prediction pretraining. Our approach significantly improves transformer-based models, achieving impressive results in finance, e-commerce, and healthcare tasks. The code is publicly available on GitHub.

LGJun 20, 2024
HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting?

Ivan Karpukhin, Foma Shipilov, Andrey Savchenko

Forecasting multiple future events within a given time horizon is essential for applications in finance, retail, social networks, and healthcare. Marked Temporal Point Processes (MTPP) provide a principled framework to model both the timing and labels of events. However, most existing research focuses on predicting only the next event, leaving long-horizon forecasting largely underexplored. To address this gap, we introduce HoTPP, the first benchmark specifically designed to rigorously evaluate long-horizon predictions. We identify shortcomings in widely used evaluation metrics, propose a theoretically grounded T-mAP metric, present strong statistical baselines, and offer efficient implementations of popular models. Our empirical results demonstrate that modern MTPP approaches often underperform simple statistical baselines. Furthermore, we analyze the diversity of predicted sequences and find that most methods exhibit mode collapse. Finally, we analyze the impact of autoregression and intensity-based losses on prediction quality, and outline promising directions for future research. The HoTPP source code, hyperparameters, and full evaluation results are available at GitHub.

CVMay 19, 2023
Diversifying Deep Ensembles: A Saliency Map Approach for Enhanced OOD Detection, Calibration, and Accuracy

Stanislav Dereka, Ivan Karpukhin, Maksim Zhdanov et al.

Deep ensembles are capable of achieving state-of-the-art results in classification and out-of-distribution (OOD) detection. However, their effectiveness is limited due to the homogeneity of learned patterns within ensembles. To overcome this issue, our study introduces Saliency Diversified Deep Ensemble (SDDE), a novel approach that promotes diversity among ensemble members by leveraging saliency maps. Through incorporating saliency map diversification, our method outperforms conventional ensemble techniques and improves calibration in multiple classification and OOD detection tasks. In particular, the proposed method achieves state-of-the-art OOD detection quality, calibration, and accuracy on multiple benchmarks, including CIFAR10/100 and large-scale ImageNet datasets.

CVFeb 14, 2022
Probabilistic Embeddings Revisited

Ivan Karpukhin, Stanislav Dereka, Sergey Kolesnikov

In recent years, deep metric learning and its probabilistic extensions claimed state-of-the-art results in the face verification task. Despite improvements in face verification, probabilistic methods received little attention in the research community and practical applications. In this paper, we, for the first time, perform an in-depth analysis of known probabilistic methods in verification and retrieval tasks. We study different design choices and propose a simple extension, achieving new state-of-the-art results among probabilistic methods. Finally, we study confidence prediction and show that it correlates with data quality, but contains little information about prediction error probability. We thus provide a new confidence evaluation benchmark and establish a baseline for future confidence prediction research. PyTorch implementation is publicly released.