Simulating realistic short tandem repeat capillary electrophoretic signal using a generative adversarial network
This work addresses a data bottleneck in forensic DNA analysis by enabling efficient training of ANNs for signal classification, though it is incremental as it builds on existing GAN methods.
The paper tackled the problem of generating realistic, prelabelled training data for an artificial neural network (ANN) used in classifying DNA profile electrophoretic signals, which is time-consuming and expensive to create manually. They developed a modified generative adversarial network (GAN) trained on 1078 DNA profiles to simulate DNA profile information and apply noise and artefact elements as a 'realism filter'.
DNA profiles are made up from multiple series of electrophoretic signal measuring fluorescence over time. Typically, human DNA analysts 'read' DNA profiles using their experience to distinguish instrument noise, artefactual signal, and signal corresponding to DNA fragments of interest. Recent work has developed an artificial neural network, ANN, to carry out the task of classifying fluorescence types into categories in DNA profile electrophoretic signal. But the creation of the necessarily large amount of labelled training data for the ANN is time consuming and expensive, and a limiting factor in the ability to robustly train the ANN. If realistic, prelabelled, training data could be simulated then this would remove the barrier to training an ANN with high efficacy. Here we develop a generative adversarial network, GAN, modified from the pix2pix GAN to achieve this task. With 1078 DNA profiles we train the GAN and achieve the ability to simulate DNA profile information, and then use the generator from the GAN as a 'realism filter' that applies the noise and artefact elements exhibited in typical electrophoretic signal.