WTKO-CNN: Deep Learning Reveals Sequence Motifs Distinguishing Wild-Type and Knockout ATAC-seq Peaks
For researchers studying chromatin regulation, this work provides a method to identify functional sequence motifs differentiating wild-type and knockout conditions, though it is an incremental application of existing deep learning and interpretability techniques.
The authors developed WTKO-CNN, a convolutional neural network with attention, to classify DNA sequences as wild-type or knockout, achieving high predictive performance. Saliency maps and k-mer clustering enabled de novo discovery of motifs that distinguish the two conditions, validated against known transcription factor binding sites.
Chromatin regulators can alter transcriptional programs by modifying the accessibility of regulatory DNA elements. Understanding how regulatory sequences differ between wild-type (WT) and knockout (KO) conditions is crucial for deciphering transcriptional control. Here, we applied a convolutional neural network, \textbf{WTKO-CNN} with an attention mechanism to classify DNA sequences as WT or KO, achieving high predictive performance. To interpret the model, we generated saliency maps to identify nucleotide positions most influential for the classification decision. From these high-saliency regions, we extracted and clustered k-mers, enabling de novo motif discovery. Sequence logos and consensus motifs derived from the CNN filters revealed biologically meaningful patterns, which are further validated using MEME, TOMTOM, and HOMER against known transcription factor binding sites. Our analysis identified motifs associated with transcription factor families that discriminate WT from KO sequences, demonstrating that CNN-guided saliency mapping is a powerful approach for uncovering functional sequence features.