A survey on Big Data and Machine Learning for Chemistry
This is an incremental survey paper that synthesizes existing research for chemists and data scientists interested in applying ML to chemical problems.
This survey reviews how big data and machine learning (ML) are applied in chemistry to accelerate solving intricate problems and enable previously intractable solutions, with a focus on materials discovery and chemical sensing within the Internet of Things (IoT). It outlines a roadmap for future developments while discussing conceptual and practical limitations, including pitfalls and case studies of success and failure.
Herein we review aspects of leading-edge research and innovation in chemistry which exploits big data and machine learning (ML), two computer science fields that combine to yield machine intelligence. ML can accelerate the solution of intricate chemical problems and even solve problems that otherwise would not be tractable. But the potential benefits of ML come at the cost of big data production; that is, the algorithms, in order to learn, demand large volumes of data of various natures and from different sources, from materials properties to sensor data. In the survey, we propose a roadmap for future developments, with emphasis on materials discovery and chemical sensing, and within the context of the Internet of Things (IoT), both prominent research fields for ML in the context of big data. In addition to providing an overview of recent advances, we elaborate upon the conceptual and practical limitations of big data and ML applied to chemistry, outlining processes, discussing pitfalls, and reviewing cases of success and failure.