Deep Learning in Mining Biological Data
It addresses the challenge of pattern recognition in large, complex biological datasets for life scientists, but is incremental as it synthesizes existing knowledge rather than presenting new methods.
This article reviews the application of deep learning (DL) to mining biological data, such as sequences, images, and signals, by providing an overview of tools, data sources, and comparisons, while outlining open challenges and future directions.
Recent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Broadly categorized in three types (i.e., sequences, images, and signals), these data are huge in amount and complex in nature. Mining such an enormous amount of data for pattern recognition is a big challenge and requires sophisticated data-intensive machine learning techniques. Artificial neural network-based learning systems are well known for their pattern recognition capabilities and lately their deep architectures - known as deep learning (DL) - have been successfully applied to solve many complex pattern recognition problems. Highlighting the role of DL in recognizing patterns in biological data, this article provides - applications of DL to biological sequences, images, and signals data; overview of open access sources of these data; description of open source DL tools applicable on these data; and comparison of these tools from qualitative and quantitative perspectives. At the end, it outlines some open research challenges in mining biological data and puts forward a number of possible future perspectives.