Signed iterative random forests to identify enhancer-associated transcription factor binding
This addresses the need for interpretable machine learning methods to predict functional genomic regulation, though it appears incremental as it builds on existing random forest approaches.
The authors tackled the problem of identifying functional transcription factor binding at enhancers beyond biochemical reproducibility, and developed signed iterative random forests (siRF) to infer regulatory interactions and binding signatures in Drosophila melanogaster.
Standard ChIP-seq peak calling pipelines seek to differentiate biochemically reproducible signals of individual genomic elements from background noise. However, reproducibility alone does not imply functional regulation (e.g., enhancer activation, alternative splicing). Here we present a general-purpose, interpretable machine learning method: signed iterative random forests (siRF), which we use to infer regulatory interactions among transcription factors and functional binding signatures surrounding enhancer elements in Drosophila melanogaster.