LGCHEM-PHMLMar 7, 2019

Transfer Learning Using Ensemble Neural Networks for Organic Solar Cell Screening

arXiv:1903.03178v422 citations
Originality Incremental advance
AI Analysis

This work addresses the time-consuming screening of candidate compounds for organic solar cells, offering a more efficient method for researchers in materials science, though it is incremental as it builds on existing machine learning approaches.

The authors tackled the problem of predicting HOMO values for organic solar cell donor molecules by developing an ensemble neural network (SINet) that uses both SMILES and InChI representations with transfer learning from a large DFT dataset, achieving significant performance improvements on smaller datasets.

Organic Solar Cells are a promising technology for solving the clean energy crisis in the world. However, generating candidate chemical compounds for solar cells is a time-consuming process requiring thousands of hours of laboratory analysis. For a solar cell, the most important property is the power conversion efficiency which is dependent on the highest occupied molecular orbitals (HOMO) values of the donor molecules. Recently, machine learning techniques have proved to be very useful in building predictive models for HOMO values of donor structures of Organic Photovoltaic Cells (OPVs). Since experimental datasets are limited in size, current machine learning models are trained on data derived from calculations based on density functional theory (DFT). Molecular line notations such as SMILES or InChI are popular input representations for describing the molecular structure of donor molecules. The two types of line representations encode different information, such as SMILES defines the bond types while InChi defines protonation. In this work, we present an ensemble deep neural network architecture, called SINet, which harnesses both the SMILES and InChI molecular representations to predict HOMO values and leverage the potential of transfer learning from a sizeable DFT-computed dataset- Harvard CEP to build more robust predictive models for relatively smaller HOPV datasets. Harvard CEP dataset contains molecular structures and properties for 2.3 million candidate donor structures for OPV while HOPV contains DFT-computed and experimental values of 350 and 243 molecules respectively. Our results demonstrate significant performance improvement from the use of transfer learning and leveraging both molecular representations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes