SPAug 25, 2023
Compressor-Based Classification for Atrial Fibrillation DetectionNikita Markov, Konstantin Ushenin, Yakov Bozhko et al.
Atrial fibrillation (AF) is one of the most common arrhythmias with challenging public health implications. Therefore, automatic detection of AF episodes on ECG is one of the essential tasks in biomedical engineering. In this paper, we applied the recently introduced method of compressor-based text classification with gzip algorithm for AF detection (binary classification between heart rhythms). We investigated the normalized compression distance applied to RR-interval and $Δ$RR-interval sequences ($Δ$RR-interval is the difference between subsequent RR-intervals). Here, the configuration of the k-nearest neighbour classifier, an optimal window length, and the choice of data types for compression were analyzed. We achieved good classification results while learning on the full MIT-BIH Atrial Fibrillation database, close to the best specialized AF detection algorithms (avg. sensitivity = 97.1\%, avg. specificity = 91.7\%, best sensitivity of 99.8\%, best specificity of 97.6\% with fivefold cross-validation). In addition, we evaluated the classification performance under the few-shot learning setting. Our results suggest that gzip compression-based classification, originally proposed for texts, is suitable for biomedical data and quantized continuous stochastic sequences in general.
CLJul 17, 2022
Natural language processing for clusterization of genes according to their functionsVladislav Dordiuk, Ekaterina Demicheva, Fernando Polanco Espino et al.
There are hundreds of methods for analysis of data obtained in mRNA-sequencing. The most of them are focused on small number of genes. In this study, we propose an approach that reduces the analysis of several thousand genes to analysis of several clusters. The list of genes is enriched with information from open databases. Then, the descriptions are encoded as vectors using the pretrained language model (BERT) and some text processing approaches. The encoded gene function pass through the dimensionality reduction and clusterization. Aiming to find the most efficient pipeline, 180 cases of pipeline with different methods in the major pipeline steps were analyzed. The performance was evaluated with clusterization indexes and expert review of the results.
APJul 17, 2022
Statistical model for describing heart rate variability in normal rhythm and atrial fibrillationNikita Markov, Ilya Kotov, Konstantin Ushenin et al.
Heart rate variability (HRV) indices describe properties of interbeat intervals in electrocardiogram (ECG). Usually HRV is measured exclusively in normal sinus rhythm (NSR) excluding any form of paroxysmal rhythm. Atrial fibrillation (AF) is the most widespread cardiac arrhythmia in human population. Usually such abnormal rhythm is not analyzed and assumed to be chaotic and unpredictable. Nonetheless, ranges of HRV indices differ between patients with AF, yet physiological characteristics which influence them are poorly understood. In this study, we propose a statistical model that describes relationship between HRV indices in NSR and AF. The model is based on Mahalanobis distance, the k-Nearest neighbour approach and multivariate normal distribution framework. Verification of the method was performed using 10 min intervals of NSR and AF that were extracted from long-term Holter ECGs. For validation we used Bhattacharyya distance and Kolmogorov-Smirnov 2-sample test in a k-fold procedure. The model is able to predict at least 7 HRV indices with high precision.
IVSep 29, 2023
Benefits of mirror weight symmetry for 3D mesh segmentation in biomedical applicationsVladislav Dordiuk, Maksim Dzhigil, Konstantin Ushenin
3D mesh segmentation is an important task with many biomedical applications. The human body has bilateral symmetry and some variations in organ positions. It allows us to expect a positive effect of rotation and inversion invariant layers in convolutional neural networks that perform biomedical segmentations. In this study, we show the impact of weight symmetry in neural networks that perform 3D mesh segmentation. We analyze the problem of 3D mesh segmentation for pathological vessel structures (aneurysms) and conventional anatomical structures (endocardium and epicardium of ventricles). Local geometrical features are encoded as sampling from the signed distance function, and the neural network performs prediction for each mesh node. We show that weight symmetry gains from 1 to 3% of additional accuracy and allows decreasing the number of trainable parameters up to 8 times without suffering the performance loss if neural networks have at least three convolutional layers. This also works for very small training sets.
CHEM-PHJun 20, 2024
$\nabla^2$DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network PotentialsKuzma Khrabrov, Anton Ber, Artem Tsypin et al.
Methods of computational quantum chemistry provide accurate approximations of molecular properties crucial for computer-aided drug discovery and other areas of chemical science. However, high computational complexity limits the scalability of their applications. Neural network potentials (NNPs) are a promising alternative to quantum chemistry methods, but they require large and diverse datasets for training. This work presents a new dataset and benchmark called $\nabla^2$DFT that is based on the nablaDFT. It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models. The dataset includes energies, forces, 17 molecular properties, Hamiltonian and overlap matrices, and a wavefunction object. All calculations were performed at the DFT level ($ω$B97X-D/def2-SVP) for each conformation. Moreover, $\nabla^2$DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules. We also introduce a novel benchmark for evaluating NNPs in molecular property prediction, Hamiltonian prediction, and conformational optimization tasks. Finally, we propose an extendable framework for training NNPs and implement 10 models within it.
CVApr 13, 2021
Anomaly Detection in Image Datasets Using Convolutional Neural Networks, Center Loss, and Mahalanobis DistanceGarnik Vareldzhan, Kirill Yurkov, Konstantin Ushenin
User activities generate a significant number of poor-quality or irrelevant images and data vectors that cannot be processed in the main data processing pipeline or included in the training dataset. Such samples can be found with manual analysis by an expert or with anomalous detection algorithms. There are several formal definitions for the anomaly samples. For neural networks, the anomalous is usually defined as out-of-distribution samples. This work proposes methods for supervised and semi-supervised detection of out-of-distribution samples in image datasets. Our approach extends a typical neural network that solves the image classification problem. Thus, one neural network after extension can solve image classification and anomalous detection problems simultaneously. Proposed methods are based on the center loss and its effect on a deep feature distribution in a last hidden layer of the neural network. This paper provides an analysis of the proposed methods for the LeNet and EfficientNet-B0 on the MNIST and ImageNet-30 datasets.
SPDec 10, 2019
Effects of lead position, cardiac rhythm variation and drug-induced QT prolongation on performance of machine learning methods for ECG processingMarat Bogdanov, Salim Baigildin, Aygul Fabarisova et al.
Machine learning shows great performance in various problems of electrocardiography (ECG) signal analysis. However, collecting a dataset for biomedical engineering is a very difficult task. Any dataset for ECG processing contains from 100 to 10,000 times fewer cases than datasets for image or text analysis. This issue is especially important because of physiological phenomena that can significantly change the morphology of heartbeats in ECG signals. In this preliminary study, we analyze the effects of lead choice from the standard ECG recordings, variation of ECG during 24-hours, and the effects of QT-prolongation agents on the performance of machine learning methods for ECG processing. We choose the problem of subject identification for analysis, because this problem may be solved for almost any available dataset of ECG data. In a discussion, we compare our findings with observations from other works that use machine learning for ECG processing with different problem statements. Our results show the importance of training dataset enrichment with ECG signals acquired in specific physiological conditions for obtaining good performance of ECG processing for real applications.
SPNov 21, 2019
Phase mapping for cardiac unipolar electrograms with neural network instead of phase transformationKonstantin Ushenin, Tatyana Nesterova, Dmitry Shmarko et al.
A phase mapping is an approach to processing signals of electrograms recorded from the surface of cardiac tissue. The main concept of phase mapping is the application of the phase transformation with the aim to obtain signals with useful properties. In our study, we propose to use a simple sawtooth signal instead of a phase signal for processing of electrogram data and building of the phase maps. We denote transformation that can provide this signal as a phase-like transformation (PLT). PLT defined via a convolutional neural network that is trained on a dataset from computer models of cardiac tissue electrophysiology. The proposed approaches were validated on data from the detailed personalized model of the human torso electrophysiology. This paper includes visualization of the phase map based on PLT and shows the robustness of the proposed approaches in the analysis of the complex non-stationary periodic activity of the excitable cardiac tissue.
IVSep 15, 2019
Comparison of UNet, ENet, and BoxENet for Segmentation of Mast Cells in Scans of Histological SlicesAlexander Karimov, Artem Razumov, Ruslana Manbatchurina et al.
Deep neural networks show high accuracy in theproblem of semantic and instance segmentation of biomedicaldata. However, this approach is computationally expensive. Thecomputational cost may be reduced with network simplificationafter training or choosing the proper architecture, which providessegmentation with less accuracy but does it much faster. In thepresent study, we analyzed the accuracy and performance ofUNet and ENet architectures for the problem of semantic imagesegmentation. In addition, we investigated the ENet architecture by replacing of some convolution layers with box-convolutionlayers. The analysis performed on the original dataset consisted of histology slices with mast cells. These cells provide a region forsegmentation with different types of borders, which vary fromclearly visible to ragged. ENet was less accurate than UNet byonly about 1-2%, but ENet performance was 8-15 times faster than UNet one.