Identification of repeats in DNA sequences using nucleotide distribution uniformity
This work addresses the need for precise identification of repetitive elements in genomics, which is important for understanding genomic structures and functions, but it appears incremental as it builds on existing methods for detecting repeats.
The authors tackled the problem of identifying repetitive elements in DNA sequences by developing an ab initio method based on nucleotide distribution uniformity, which can detect periodicities, consensus repeat patterns, copy numbers, and perfect levels with linear complexity.
Repetitive elements are important in genomic structures, functions and regulations, yet effective methods in precisely identifying repetitive elements in DNA sequences are not fully accessible, and the relationship between repetitive elements and periodicities of genomes is not clearly understood. We present an $\textit{ab initio}$ method to quantitatively detect repetitive elements and infer the consensus repeat pattern in repetitive elements. The method uses the measure of the distribution uniformity of nucleotides at periodic positions in DNA sequences or genomes. It can identify periodicities, consensus repeat patterns, copy numbers and perfect levels of repetitive elements. The results of using the method on different DNA sequences and genomes demonstrate efficacy and accuracy in identifying repeat patterns and periodicities. The complexity of the method is linear with respect to the lengths of the analyzed sequences.