Yingjun Shen

CV
h-index8
3papers
8citations
Novelty48%
AI Score30

3 Papers

CVOct 15, 2024
DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM

Yingjun Shen, Haizhao Dai, Qihe Chen et al.

Foundation models in computer vision have demonstrated exceptional performance in zero-shot and few-shot tasks by extracting multi-purpose features from large-scale datasets through self-supervised pre-training methods. However, these models often overlook the severe corruption in cryogenic electron microscopy (cryo-EM) images by high-level noises. We introduce DRACO, a Denoising-Reconstruction Autoencoder for CryO-EM, inspired by the Noise2Noise (N2N) approach. By processing cryo-EM movies into odd and even images and treating them as independent noisy observations, we apply a denoising-reconstruction hybrid training scheme. We mask both images to create denoising and reconstruction tasks. For DRACO's pre-training, the quality of the dataset is essential, we hence build a high-quality, diverse dataset from an uncurated public database, including over 270,000 movies or micrographs. After pre-training, DRACO naturally serves as a generalizable cryo-EM image denoiser and a foundation model for various cryo-EM downstream tasks. DRACO demonstrates the best performance in denoising, micrograph curation, and particle picking tasks compared to state-of-the-art baselines.

SYMar 30, 2022
Prognosis of Rotor Parts Fly-off Based on Cascade Classification and Online Prediction Ability Index

Yingjun Shen, Zhe Song, Andrew Kusiak

Large rotating machines, e.g., compressors, steam turbines, gas turbines, are critical equipment in many process industries such as energy, chemical, and power generation. Due to high rotating speed and tremendous momentum of the rotor, the centrifugal force may lead to flying apart of the rotor parts, which brings a great threat to the operation safety. Early detection and prediction of potential failures could prevent the catastrophic plant downtime and economic loss. In this paper, we divide the operational states of a rotating machine into normal, risky, and high-risk ones based on the time to the moment of failure. Then a cascade classifying algorithm is proposed to predict the states in two steps, first we judge whether the machine is in normal or abnormal condition; for time periods which are predicted as abnormal we further classify them into risky or high-risk states. Moreover, traditional classification model evaluation metrics, such as confusion matrix, true-false accuracy, are static and neglect the online prediction dynamics and uneven wrong-prediction prices. An Online Prediction Ability Index (OPAI) is proposed to select prediction models with consistent online predictions and smaller close-to-downtime prediction errors. Real-world data sets and computational experiments are used to verify the effectiveness of proposed methods.

LGMay 4, 2021
Enhancing Generalizability of Predictive Models with Synergy of Data and Physics

Yingjun Shen, Zhe Song, Andrew Kusiak

Wind farm needs prediction models for predictive maintenance. There is a need to predict values of non-observable parameters beyond ranges reflected in available data. A prediction model developed for one machine many not perform well in another similar machine. This is usually due to lack of generalizability of data-driven models. To increase generalizability of predictive models, this research integrates the data mining with first-principle knowledge. Physics-based principles are combined with machine learning algorithms through feature engineering, strong rules and divide-and-conquer. The proposed synergy concept is illustrated with the wind turbine blade icing prediction and achieves significant prediction accuracy across different turbines. The proposed process is widely accepted by wind energy predictive maintenance practitioners because of its simplicity and efficiency. Furthermore, this paper demonstrates the importance of embedding physical principles within the machine learning process, and also highlight an important point that the need for more complex machine learning algorithms in industrial big data mining is often much less than it is in other applications, making it essential to incorporate physics and follow Less is More philosophy.