IV LGFeb 18, 2025

Synthetic generation of 2D data records based on Autoencoders

Darius Couchard, Oscar Olarte, Rob Haelterman

arXiv:2502.13183v15.1MeMeA

Originality Incremental advance

AI Analysis

This addresses data scarcity issues in machine learning for analytical techniques like GC-IMS, though it is incremental as it applies existing deep learning concepts to a specific domain.

The study tackled the problem of limited labelled datasets for two-dimensional spectra like GC-IMS data by introducing a novel method for generating synthetic 2D spectra using Autoencoders, resulting in significantly improved classification performance when synthesized records were added.

Gas Chromatography coupled with Ion Mobility Spectrometry (GC-IMS) is a dual-separation analytical technique widely used for identifying components in gaseous samples by separating and analysing the arrival times of their constituent species. Data generated by GC-IMS is typically represented as two-dimensional spectra, providing rich information but posing challenges for data-driven analysis due to limited labelled datasets. This study introduces a novel method for generating synthetic 2D spectra using a deep learning framework based on Autoencoders. Although applied here to GC-IMS data, the approach is broadly applicable to any two-dimensional spectral measurements where labelled data are scarce. While performing component classification over a labelled dataset of GC-IMS records, the addition of synthesized records significantly has improved the classification performance, demonstrating the method's potential for overcoming dataset limitations in machine learning frameworks.

View on arXiv PDF

Similar