MLEM: Generative and Contrastive Learning as Distinct Modalities for Event Sequences
This addresses a gap in self-supervised learning for event sequences in domains like banking and healthcare, but it is incremental as it builds on existing methods by combining them.
The study tackled the problem of applying self-supervised learning to event sequences by comparing contrastive and generative methods, finding neither superior alone, and developed MLEM, a novel hybrid model that combines them to achieve superior performance across multiple metrics.
This study explores the application of self-supervised learning techniques for event sequences. It is a key modality in various applications such as banking, e-commerce, and healthcare. However, there is limited research on self-supervised learning for event sequences, and methods from other domains like images, texts, and speech may not easily transfer. To determine the most suitable approach, we conduct a detailed comparative analysis of previously identified best-performing methods. We find that neither the contrastive nor generative method is superior. Our assessment includes classifying event sequences, predicting the next event, and evaluating embedding quality. These results further highlight the potential benefits of combining both methods. Given the lack of research on hybrid models in this domain, we initially adapt the baseline model from another domain. However, upon observing its underperformance, we develop a novel method called the Multimodal-Learning Event Model (MLEM). MLEM treats contrastive learning and generative modeling as distinct yet complementary modalities, aligning their embeddings. The results of our study demonstrate that combining contrastive and generative approaches into one procedure with MLEM achieves superior performance across multiple metrics.