CE AI LGApr 27, 2023

Molecular Design Based on Integer Programming and Splitting Data Sets by Hyperplanes

Jianshen Zhu, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Hiroshi Nagamochi, Tatsuya Akutsu

arXiv:2305.00801v12.31 citationsh-index: 54

Originality Incremental advance

AI Analysis

This work addresses the challenge of constructing accurate prediction functions for molecular design, which is incremental as it builds on an existing framework by introducing a data-splitting technique.

The authors tackled the problem of designing molecular structures with desired chemical properties by proposing a method that splits a dataset using hyperplanes to improve prediction functions, resulting in enhanced learning performance for several challenging chemical properties.

A novel framework for designing the molecular structure of chemical compounds with a desired chemical property has recently been proposed. The framework infers a desired chemical graph by solving a mixed integer linear program (MILP) that simulates the computation process of a feature function defined by a two-layered model on chemical graphs and a prediction function constructed by a machine learning method. To improve the learning performance of prediction functions in the framework, we design a method that splits a given data set $\mathcal{C}$ into two subsets $\mathcal{C}^{(i)},i=1,2$ by a hyperplane in a chemical space so that most compounds in the first (resp., second) subset have observed values lower (resp., higher) than a threshold $θ$. We construct a prediction function $ψ$ to the data set $\mathcal{C}$ by combining prediction functions $ψ_i,i=1,2$ each of which is constructed on $\mathcal{C}^{(i)}$ independently. The results of our computational experiments suggest that the proposed method improved the learning performance for several chemical properties to which a good prediction function has been difficult to construct.

View on arXiv PDF

Similar