LGMar 10, 2022
Forecasting the abnormal events at well drilling with machine learningEkaterina Gurina, Nikita Klyuchnikov, Ksenia Antipova et al.
We present a data-driven and physics-informed algorithm for drilling accident forecasting. The core machine-learning algorithm uses the data from the drilling telemetry representing the time-series. We have developed a Bag-of-features representation of the time series that enables the algorithm to predict the probabilities of six types of drilling accidents in real-time. The machine-learning model is trained on the 125 past drilling accidents from 100 different Russian oil and gas wells. Validation shows that the model can forecast 70% of drilling accidents with a false positive rate equals to 40%. The model addresses partial prevention of the drilling accidents at the well construction.
LGSep 6, 2022
Making the black-box brighter: interpreting machine learning algorithm for forecasting drilling accidentsEkaterina Gurina, Nikita Klyuchnikov, Ksenia Antipova et al.
We present an approach for interpreting a black-box alarming system for forecasting accidents and anomalies during the drilling of oil and gas wells. The interpretation methodology aims to explain the local behavior of the accident predictive model to drilling engineers. The explanatory model uses Shapley additive explanations analysis of features, obtained through Bag-of-features representation of telemetry logs used during the drilling accident forecasting phase. Validation shows that the explanatory model has 15% precision at 70% recall, and overcomes the metric values of a random baseline and multi-head attention neural network. These results justify that the developed explanatory model is better aligned with explanations of drilling engineers, than the state-of-the-art method. The joint performance of explanatory and Bag-of-features models allows drilling engineers to understand the logic behind the system decisions at the particular moment, pay attention to highlighted telemetry regions, and correspondingly, increase the trust level in the accident forecasting alarms.
MTRL-SCIJul 5, 2024
An autoencoder for compressing angle-resolved photoemission spectroscopy dataSteinn Ymir Agustsson, Mohammad Ahsanul Haque, Thi Tam Truong et al.
Angle-resolved photoemission spectroscopy (ARPES) is a powerful experimental technique to determine the electronic structure of solids. Advances in light sources for ARPES experiments are currently leading to a vast increase of data acquisition rates and data quantity. On the other hand, access time to the most advanced ARPES instruments remains strictly limited, calling for fast, effective, and on-the-fly data analysis tools to exploit this time. In response to this need, we introduce ARPESNet, a versatile autoencoder network that efficiently summmarises and compresses ARPES datasets. We train ARPESNet on a large and varied dataset of 2-dimensional ARPES data extracted by cutting standard 3-dimensional ARPES datasets along random directions in $\mathbf{k}$. To test the data representation capacity of ARPESNet, we compare $k$-means clustering quality between data compressed by ARPESNet, data compressed by discrete cosine transform, and raw data, at different noise levels. ARPESNet data excels in clustering quality despite its high compression ratio.
CLJul 23, 2021
A Differentiable Language Model Adversarial Attack on Text ClassifiersIvan Fursov, Alexey Zaytsev, Pavel Burnyshev et al.
Robustness of huge Transformer-based models for natural language processing is an important issue due to their capabilities and wide adoption. One way to understand and improve robustness of these models is an exploration of an adversarial attack scenario: check if a small perturbation of an input can fool a model. Due to the discrete nature of textual data, gradient-based adversarial methods, widely used in computer vision, are not applicable per~se. The standard strategy to overcome this issue is to develop token-level transformations, which do not take the whole sentence into account. In this paper, we propose a new black-box sentence-level attack. Our method fine-tunes a pre-trained language model to generate adversarial examples. A proposed differentiable loss function depends on a substitute classifier score and an approximate edit distance computed via a deep learning model. We show that the proposed attack outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation. Moreover, due to the usage of the fine-tuned language model, the generated adversarial examples are hard to detect, thus current models are not robust. Hence, it is difficult to defend from the proposed attack, which is not the case for other attacks.
LGJun 15, 2020
Multi-fidelity Neural Architecture Search with Knowledge DistillationIlya Trofimov, Nikita Klyuchnikov, Mikhail Salnikov et al.
Neural architecture search (NAS) targets at finding the optimal architecture of a neural network for a problem or a family of problems. Evaluations of neural architectures are very time-consuming. One of the possible ways to mitigate this issue is to use low-fidelity evaluations, namely training on a part of a dataset, fewer epochs, with fewer channels, etc. In this paper, we propose a bayesian multi-fidelity method for neural architecture search: MF-KD. The method relies on a new approach to low-fidelity evaluations of neural architectures by training for a few epochs using a knowledge distillation. Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network. We carry out experiments on CIFAR-10, CIFAR-100, and ImageNet-16-120. We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss. The proposed method outperforms several state-of-the-art baselines.
LGJun 12, 2020
NAS-Bench-NLP: Neural Architecture Search Benchmark for Natural Language ProcessingNikita Klyuchnikov, Ilya Trofimov, Ekaterina Artemova et al.
Neural Architecture Search (NAS) is a promising and rapidly evolving research area. Training a large number of neural networks requires an exceptional amount of computational power, which makes NAS unreachable for those researchers who have limited or no access to high-performance clusters and supercomputers. A few benchmarks with precomputed neural architectures performances have been recently introduced to overcome this problem and ensure more reproducible experiments. However, these benchmarks are only for the computer vision domain and, thus, are built from the image datasets and convolution-derived architectures. In this work, we step outside the computer vision domain by leveraging the language modeling task, which is the core of natural language processing (NLP). Our main contribution is as follows: we have provided search space of recurrent neural networks on the text datasets and trained 14k architectures within it; we have conducted both intrinsic and extrinsic evaluation of the trained models using datasets for semantic relatedness and language understanding evaluation; finally, we have tested several NAS algorithms to demonstrate how the precomputed results can be utilized. We believe that our results have high potential of usage for both NAS and NLP communities.
LGJun 6, 2019
Application of Machine Learning to accidents detection at directional drillingEkaterina Gurina, Nikita Klyuchnikov, Alexey Zaytsev et al.
We present a data-driven algorithm and mathematical model for anomaly alarming at directional drilling. The algorithm is based on machine learning. It compares the real-time drilling telemetry with one corresponding to past accidents and analyses the level of similarity. The model performs a time-series comparison using aggregated statistics and Gradient Boosting classification. It is trained on historical data containing the drilling telemetry of $80$ wells drilled within $19$ oilfields. The model can detect an anomaly and identify its type by comparing the real-time measurements while drilling with the ones from the database of past accidents. Validation tests show that our algorithm identifies half of the anomalies with about $0.53$ false alarms per day on average. The model performance ensures sufficient time and cost savings as it enables partial prevention of the failures and accidents at the well construction.
LGMar 27, 2019
Real-time data-driven detection of the rock type alteration during a directional drillingEvgenya Romanenkova, Alexey Zaytsev, Nikita Klyuchnikov et al.
During the directional drilling, a bit may sometimes go to a nonproductive rock layer due to the gap about 20m between the bit and high-fidelity rock type sensors. The only way to detect the lithotype changes in time is the usage of Measurements While Drilling (MWD) data. However, there are no general mathematical modeling approaches that both well reconstruct the rock type based on MWD data and correspond to specifics of the oil and gas industry. In this article, we present a data-driven procedure that utilizes MWD data for quick detection of changes in rock type. We propose the approach that combines traditional machine learning based on the solution of the rock type classification problem with change detection procedures rarely used before in the Oil\&Gas industry. The data come from a newly developed oilfield in the north of western Siberia. The results suggest that we can detect a significant part of changes in rock type reducing the change detection delay from $20$ to $1.8$ meters and the number of false-positive alarms from $43$ to $6$ per well.
LGSep 13, 2018
Gaussian Process Classification for Variable Fidelity DataNikita Klyuchnikov, Evgeny Burnaev
In this paper we address a classification problem where two sources of labels with different levels of fidelity are available. Our approach is to combine data from both sources by applying a co-kriging schema on latent functions, which allows the model to account item-dependent labeling discrepancy. We provide an extension of Laplace inference for Gaussian process classification, that takes into account multi-fidelity data. We evaluate the proposed method on real and synthetic datasets and show that it is more resistant to different levels of discrepancy between sources than other approaches for data fusion. Our method can provide accuracy/cost trade-off for a number of practical tasks such as crowd-sourced data annotation and feasibility regions construction in engineering design.
LGJun 8, 2018
Data-driven model for the identification of the rock type at a drilling bitNikita Klyuchnikov, Alexey Zaytsev, Arseniy Gruzdev et al.
Directional oil well drilling requires high precision of the wellbore positioning inside the productive area. However, due to specifics of engineering design, sensors that explicitly determine the type of the drilled rock are located farther than 15m from the drilling bit. As a result, the target area runaways can be detected only after this distance, which in turn, leads to a loss in well productivity and the risk of the need for an expensive re-boring operation. We present a novel approach for identifying rock type at the drilling bit based on machine learning classification methods and data mining on sensors readings. We compare various machine-learning algorithms, examine extra features coming from mathematical modeling of drilling mechanics, and show that the real-time rock type classification error can be reduced from 13.5 % to 9 %. The approach is applicable for precise directional drilling in relatively thin target intervals of complex shapes and generalizes appropriately to new wells that are different from the ones used for training the machine learning model.