LGDec 15, 2022
Spatially-resolved Thermometry from Line-of-Sight Emission Spectroscopy via Machine LearningRuiyuan Kang, Dimitrios C. Kyritsis, Panos Liatsis
A methodology is proposed, which addresses the caveat that line-of-sight emission spectroscopy presents in that it cannot provide spatially resolved temperature measurements in nonhomogeneous temperature fields. The aim of this research is to explore the use of data-driven models in measuring temperature distributions in a spatially resolved manner using emission spectroscopy data. Two categories of data-driven methods are analyzed: (i) Feature engineering and classical machine learning algorithms, and (ii) end-to-end convolutional neural networks (CNN). In total, combinations of fifteen feature groups and fifteen classical machine learning models, and eleven CNN models are considered and their performances explored. The results indicate that the combination of feature engineering and machine learning provides better performance than the direct use of CNN. Notably, feature engineering which is comprised of physics-guided transformation, signal representation-based feature extraction and Principal Component Analysis is found to be the most effective. Moreover, it is shown that when using the extracted features, the ensemble-based, light blender learning model offers the best performance with RMSE, RE, RRMSE and R values of 64.3, 0.017, 0.025 and 0.994, respectively. The proposed method, based on feature engineering and the light blender model, is capable of measuring nonuniform temperature distributions from low-resolution spectra, even when the species concentration distribution in the gas mixtures is unknown.
CVOct 14, 2022
Flame-state monitoring based on very low number of visible or infrared images via few-shot learningRuiyuan Kang, Panos Liatsis, Dimitrios C. Kyritsis
The current success of machine learning on image-based combustion monitoring is based on massive data, which is costly even impossible for industrial applications. To address this conflict, we introduce few-shot learning in order to achieve combustion monitoring and classification for the first time. Two algorithms, Siamese Network coupled with k Nearest Neighbors (SN-kNN) and Prototypical Network (PN), were tested. Rather than utilizing solely visible images as discussed in previous studies, we also used Infrared (IR) images. We analyzed the training process, test performance and inference speed of two algorithms on both image formats, and also used t-SNE to visualize learned features. The results demonstrated that both SN-kNN and PN were capable to distinguish flame states from learning with merely 20 images per flame state. The worst performance, which was realized by PN on IR images, still possessed precision, accuracy, recall, and F1-score above 0.95. We showed that visible images demonstrated more substantial differences between classes and presented more consistent patterns inside the class, which made the training speed and model performance better compared to IR images. In contrast, the relatively low quality of IR images made it difficult for PN to extract distinguishable prototypes, which caused relatively weak performance. With the entrire training set supporting classification, SN-kNN performed well with IR images. On the other hand, benefitting from the architecture design, PN has a much faster speed in training and inference than SN-kNN. The presented work analyzed the characteristics of both algorithms and image formats for the first time, thus providing guidance for their future utilization in combustion monitoring tasks.
LGSep 25, 2023
Physics-Driven ML-Based Modelling for Correcting Inverse EstimationRuiyuan Kang, Tingting Mu, Panos Liatsis et al.
When deploying machine learning estimators in science and engineering (SAE) domains, it is critical to avoid failed estimations that can have disastrous consequences, e.g., in aero engine design. This work focuses on detecting and correcting failed state estimations before adopting them in SAE inverse problems, by utilizing simulations and performance metrics guided by physical laws. We suggest to flag a machine learning estimation when its physical model error exceeds a feasible threshold, and propose a novel approach, GEESE, to correct it through optimization, aiming at delivering both low error and high efficiency. The key designs of GEESE include (1) a hybrid surrogate error model to provide fast error estimations to reduce simulation cost and to enable gradient based backpropagation of error feedback, and (2) two generative models to approximate the probability distributions of the candidate states for simulating the exploitation and exploration behaviours. All three models are constructed as neural networks. GEESE is tested on three real-world SAE inverse problems and compared to a number of state-of-the-art optimization/search approaches. Results show that it fails the least number of times in terms of finding a feasible state correction, and requires physical evaluations less frequently in general.
CVDec 22, 2024Code
A Conditional Diffusion Model for Electrical Impedance Tomography Image ReconstructionShuaikai Shi, Ruiyuan Kang, Panos Liatsis
Electrical impedance tomography (EIT) is a non-invasive imaging technique, capable of reconstructing images of the electrical conductivity of tissues and materials. It is popular in diverse application areas, from medical imaging to industrial process monitoring and tactile sensing, due to its low cost, real-time capabilities and non-ionizing nature. EIT visualizes the conductivity distribution within a body by measuring the boundary voltages, given a current injection. However, EIT image reconstruction is ill-posed due to the mismatch between the under-sampled voltage data and the high-resolution conductivity image. A variety of approaches, both conventional and deep learning-based, have been proposed, capitalizing on the use of spatial regularizers, and the paradigm of image regression. In this research, a novel method based on the conditional diffusion model for EIT reconstruction is proposed, termed CDEIT. Specifically, CDEIT consists of the forward diffusion process, which first gradually adds Gaussian noise to the clean conductivity images, and a reverse denoising process, which learns to predict the original conductivity image from its noisy version, conditioned on the boundary voltages. Following model training, CDEIT applies the conditional reverse process on test voltage data to generate the desired conductivities. Moreover, we provide the details of a normalization procedure, which demonstrates how EIT image reconstruction models trained on simulated datasets can be applied on real datasets with varying sizes, excitation currents and background conductivities. Experiments conducted on a synthetic dataset and two real datasets demonstrate that the proposed model outperforms state-of-the-art methods. The CDEIT software is available as open-source (https://github.com/shuaikaishi/CDEIT) for reproducibility purposes.
CVJul 31, 2025
Optimizing Vision-Language Consistency via Cross-Layer Regional Attention AlignmentYifan Wang, Hongfeng Ai, Quangao Liu et al.
Vision Language Models (VLMs) face challenges in effectively coordinating diverse attention mechanisms for cross-modal embedding learning, leading to mismatched attention and suboptimal performance. We propose Consistent Cross-layer Regional Alignment (CCRA), which introduces Layer-Patch-wise Cross Attention (LPWCA) to capture fine-grained regional-semantic correlations by jointly weighting patch and layer-wise embedding, and Progressive Attention Integration (PAI) that systematically coordinates LPWCA, layer-wise, and patch-wise attention mechanisms in sequence. This progressive design ensures consistency from semantic to regional levels while preventing attention drift and maximizing individual attention benefits. Experimental results on ten diverse vision-language benchmarks demonstrate that our CCRA-enhanced LLaVA-v1.5-7B model achieves state-of-the-art performance, outperforming all baseline methods with only 3.55M additional parameters, while providing enhanced interpretability through more regionally focused and semantically aligned attention patterns.