Dynamic Spectrum Matching with One-shot Learning
This addresses a practical limitation in spectroscopy for real-world applications where data is scarce and updates are frequent, though it is an incremental improvement over existing Siamese network methods.
The paper tackles the problem of classifying substances from vibrational spectroscopy when only a few training samples per class are available and retraining for new classes is computationally intensive, by reformulating it as a binary classification using a Siamese CNN with a novel sampling strategy, achieving better accuracy than other practical systems and enabling one-shot learning for unseen classes.
Convolutional neural networks (CNN) have been shown to provide a good solution for classification problems that utilize data obtained from vibrational spectroscopy. Moreover, CNNs are capable of identification from noisy spectra without the need for additional preprocessing. However, their application in practical spectroscopy is limited due to two shortcomings. The effectiveness of the classification using CNNs drops rapidly when only a small number of spectra per substance are available for training (which is a typical situation in real applications). Additionally, to accommodate new, previously unseen substance classes, the network must be retrained which is computationally intensive. Here we address these issues by reformulating a multi-class classification problem with a large number of classes, but a small number of samples per class, to a binary classification problem with sufficient data available for representation learning. Namely, we define the learning task as identifying pairs of inputs as belonging to the same or different classes. We achieve this using a Siamese convolutional neural network. A novel sampling strategy is proposed to address the imbalance problem in training the Siamese Network. The trained network can effectively classify samples of unseen substance classes using just a single reference sample (termed as one-shot learning in the machine learning community). Our results demonstrate better accuracy than other practical systems to date, while allowing effortless updates of the system's database with novel substance classes.