Look the Other Way: Designing 'Positive' Molecules with Negative Data via Task Arithmetic
This addresses a bottleneck in de novo molecule design for drug discovery or materials science, offering a data-efficient transfer learning strategy.
The paper tackles the scarcity of desirable molecules in generative design by proposing molecular task arithmetic, which trains on abundant negative examples to learn property directions and generates positive molecules by moving models in opposite directions, resulting in more diverse and successful designs in 20 zero-shot experiments.
The scarcity of molecules with desirable properties (i.e., 'positive' molecules) is an inherent bottleneck for generative molecule design. To sidestep such obstacle, here we propose molecular task arithmetic: training a model on diverse and abundant negative examples to learn 'property directions' $--$ without accessing any positively labeled data $--$ and moving models in the opposite property directions to generate positive molecules. When analyzed on 20 zero-shot design experiments, molecular task arithmetic generated more diverse and successful designs than models trained on positive molecules. Moreover, we employed molecular task arithmetic in dual-objective and few-shot design tasks. We find that molecular task arithmetic can consistently increase the diversity of designs while maintaining desirable design properties. With its simplicity, data efficiency, and performance, molecular task arithmetic bears the potential to become the $\textit{de-facto}$ transfer learning strategy for de novo molecule design.