CLAug 4, 2024Code
MedSyn: LLM-based Synthetic Medical Text Generation FrameworkGleb Kumichev, Pavel Blinov, Yulia Kuzkina et al.
Generating synthetic text addresses the challenge of data availability in privacy-sensitive domains such as healthcare. This study explores the applicability of synthetic data in real-world medical settings. We introduce MedSyn, a novel medical text generation framework that integrates large language models with a Medical Knowledge Graph (MKG). We use MKG to sample prior medical information for the prompt and generate synthetic clinical notes with GPT-4 and fine-tuned LLaMA models. We assess the benefit of synthetic data through application in the ICD code prediction task. Our research indicates that synthetic data can increase the classification accuracy of vital and challenging codes by up to 17.8% compared to settings without synthetic data. Furthermore, to provide new data for further research in the healthcare domain, we present the largest open-source synthetic dataset of clinical notes for the Russian language, comprising over 41k samples covering 219 ICD-10 codes.
LGSep 26, 2023Code
Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHRPetr Philonenko, Vladimir Kokh, Pavel Blinov
Conventional medical cancer screening methods are costly, labor-intensive, and extremely difficult to scale. Although AI can improve cancer detection, most systems rely on complex or specialized medical data, making them impractical for large-scale screening. We introduce Can-SAVE, a lightweight AI system that ranks population-wide cancer risks solely based on medical history events. By integrating survival model outputs into a gradient-boosting framework, our approach detects subtle, long-term patient risk patterns - often well before clinical symptoms manifest. Can-SAVE was rigorously evaluated on a real-world dataset of 2.5 million adults spanning five Russian regions, marking the study as one of the largest and most comprehensive deployments of AI-driven cancer risk assessment. In a retrospective oncologist-supervised study over 1.9M patients, Can-SAVE achieves a 4-10x higher detection rate at identical screening volumes and an Average Precision (AP) of 0.228 vs. 0.193 for the best baseline (LoRA-tuned Qwen3-Embeddings via DeepSeek-R1 summarization). In a year-long prospective pilot (426K patients), our method almost doubled the cancer detection rate (+91%) and increased population coverage by 36% over the national screening protocol. The system demonstrates practical scalability: a city-wide population of 1 million patients can be processed in under three hours using standard hardware, enabling seamless clinical integration. This work proves that Can-SAVE achieves nationally significant cancer detection improvements while adhering to real-world public healthcare constraints, offering immediate clinical utility and a replicable framework for population-wide screening. Code for training and feature engineering is available at https://github.com/sb-ai-lab/Can-SAVE.
HCMar 26, 2025Code
3MDBench: Medical Multimodal Multi-agent Dialogue BenchmarkIvan Sviridov, Amina Miftakhova, Artemiy Tereshchenko et al.
Though Large Vision-Language Models (LVLMs) are being actively explored in medicine, their ability to conduct complex real-world telemedicine consultations combining accurate diagnosis with professional dialogue remains underexplored. This paper presents 3MDBench (Medical Multimodal Multi-agent Dialogue Benchmark), an open-source framework for simulating and evaluating LVLM-driven telemedical consultations. 3MDBench simulates patient variability through temperament-based Patient Agent and evaluates diagnostic accuracy and dialogue quality via Assessor Agent. It includes 2996 cases across 34 diagnoses from real-world telemedicine interactions, combining textual and image-based data. The experimental study compares diagnostic strategies for widely used open and closed-source LVLMs. We demonstrate that multimodal dialogue with internal reasoning improves F1 score by 6.5% over non-dialogue settings, highlighting the importance of context-aware, information-seeking questioning. Moreover, injecting predictions from a diagnostic convolutional neural network into the LVLM's context boosts F1 by up to 20%. Source code is available at https://github.com/univanxx/3mdbench.
CVJul 29, 2024
CardioSyntax: end-to-end SYNTAX score prediction -- dataset, benchmark and methodAlexander Ponomarchuk, Ivan Kruzhilov, Galina Zubkova et al.
The SYNTAX score has become a widely used measure of coronary disease severity, crucial in selecting the optimal mode of the revascularization procedure. This paper introduces a new medical regression and classification problem - automatically estimating SYNTAX score from coronary angiography. Our study presents a comprehensive CardioSYNTAX dataset of 3,018 patients for the SYNTAX score estimation and coronary dominance classification. The dataset features a balanced distribution of individuals with zero and non-zero scores. This dataset includes a first-of-its-kind, complete coronary angiography samples captured through a multi-view X-ray video, allowing one to observe coronary arteries from multiple perspectives. Furthermore, we present a novel, fully automatic end-to-end method for estimating the SYNTAX. For such a difficult task, we have achieved a solid coefficient of determination R2 of 0.51 in score value prediction and 77.3% accuracy for zero score classification.
CVAug 25, 2025
Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health Biomarkers EstimationKonstantin Egorov, Stepan Botman, Pavel Blinov et al.
Progress in remote PhotoPlethysmoGraphy (rPPG) is limited by the critical issues of existing publicly available datasets: small size, privacy concerns with facial videos, and lack of diversity in conditions. The paper introduces a novel comprehensive large-scale multi-view video dataset for rPPG and health biomarkers estimation. Our dataset comprises 3600 synchronized video recordings from 600 subjects, captured under varied conditions (resting and post-exercise) using multiple consumer-grade cameras at different angles. To enable multimodal analysis of physiological states, each recording is paired with a 100 Hz PPG signal and extended health metrics, such as electrocardiogram, arterial blood pressure, biomarkers, temperature, oxygen saturation, respiratory rate, and stress level. Using this data, we train an efficient rPPG model and compare its quality with existing approaches in cross-dataset scenarios. The public release of our dataset and model should significantly speed up the progress in the development of AI medical assistants.
AIFeb 26, 2024
GigaPevt: Multimodal Medical AssistantPavel Blinov, Konstantin Egorov, Ivan Sviridov et al.
Building an intelligent and efficient medical assistant is still a challenging AI problem. The major limitation comes from the data modality scarceness, which reduces comprehensive patient perception. This demo paper presents the GigaPevt, the first multimodal medical assistant that combines the dialog capabilities of large language models with specialized medical models. Such an approach shows immediate advantages in dialog quality and metric performance, with a 1.18% accuracy improvement in the question-answering task.
CLJan 17, 2022
RuMedBench: A Russian Medical Language Understanding BenchmarkPavel Blinov, Arina Reshetnikova, Aleksandr Nesterov et al.
The paper describes the open Russian medical language understanding benchmark covering several task types (classification, question answering, natural language inference, named entity recognition) on a number of novel text sets. Given the sensitive nature of the data in healthcare, such a benchmark partially closes the problem of Russian medical dataset absence. We prepare the unified format labeling, data split, and evaluation metrics for new tasks. The remaining tasks are from existing datasets with a few modifications. A single-number metric expresses a model's ability to cope with the benchmark. Moreover, we implement several baseline models, from simple ones to neural networks with transformer architecture, and release the code. Expectedly, the more advanced models yield better performance, but even a simple model is enough for a decent result in some tasks. Furthermore, for all tasks, we provide a human evaluation. Interestingly the models outperform humans in the large-scale classification tasks. However, the advantage of natural intelligence remains in the tasks requiring more knowledge and reasoning.
CYJun 21, 2021
Medical Profile Model: Scientific and Practical Applications in HealthcarePavel Blinov, Vladimir Kokh
The paper researches the problem of representation learning for electronic health records. We present the patient histories as temporal sequences of diseases for which embeddings are learned in an unsupervised setup with a transformer-based neural network model. Additionally the embedding space includes demographic parameters which allow the creation of generalized patient profiles and successful transfer of medical knowledge to other domains. The training of such a medical profile model has been performed on a dataset of more than one million patients. Detailed model analysis and its comparison with the state-of-the-art method show its clear advantage in the diagnosis prediction task. Further, we show two applications based on the developed profile model. First, a novel Harbinger Disease Discovery method allowing to reveal disease associated hypotheses and potentially are beneficial in the design of epidemiological studies. Second, the patient embeddings extracted from the profile model applied to the insurance scoring task allow significant improvement in the performance metrics.
CLJul 15, 2020
Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural NetworksPavel Blinov, Manvel Avetisian, Vladimir Kokh et al.
In this paper we study the problem of predicting clinical diagnoses from textual Electronic Health Records (EHR) data. We show the importance of this problem in medical community and present comprehensive historical review of the problem and proposed methods. As the main scientific contributions we present a modification of Bidirectional Encoder Representations from Transformers (BERT) model for sequence classification that implements a novel way of Fully-Connected (FC) layer composition and a BERT model pretrained only on domain data. To empirically validate our model, we use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits. This is the largest such study for the Russian language and one of the largest globally. We performed a number of comparative experiments with other text representation models on the task of multiclass classification for 265 disease subset of ICD-10. The experiments demonstrate improved performance of our models compared to other baselines, including a fine-tuned Russian BERT (RuBERT) variant. We also show comparable performance of our model with a panel of experienced medical experts. This allows us to hope that implementation of this system will reduce misdiagnosis.