Powerful, transferable representations for molecules through intelligent task selection in deep multitask networks
This work addresses challenges in drug discovery and materials innovation by improving the efficiency and bias-resistance of molecular representations, though it is incremental as it builds on existing multi-task and transfer learning methods.
The paper tackled the limitations of deep learning chemical representations—cost, bias, and data requirements—by using multi-task learning with transfer learning and programmatic task selection based on pairwise task affinity, demonstrating its effectiveness in low-data environments and showing that the deep representation captures more expressive information than a common cheminformatics fingerprint.
Chemical representations derived from deep learning are emerging as a powerful tool in areas such as drug discovery and materials innovation. Currently, this methodology has three major limitations - the cost of representation generation, risk of inherited bias, and the requirement for large amounts of data. We propose the use of multi-task learning in tandem with transfer learning to address these limitations directly. In order to avoid introducing unknown bias into multi-task learning through the task selection itself, we calculate task similarity through pairwise task affinity, and use this measure to programmatically select tasks. We test this methodology on several real-world data sets to demonstrate its potential for execution in complex and low-data environments. Finally, we utilise the task similarity to further probe the expressiveness of the learned representation through a comparison to a commonly used cheminformatics fingerprint, and show that the deep representation is able to capture more expressive task-based information.