NEAug 12, 2012

An Efficient Genetic Programming System with Geometric Semantic Operators and its Application to Human Oral Bioavailability Prediction

arXiv:1208.2437v16 citations
AI Analysis

This work addresses a practical bottleneck in genetic programming for researchers in computational biology and drug discovery, offering an incremental improvement with strong domain-specific impact.

The paper tackles the exponential growth limitation of geometric semantic operators in genetic programming by introducing an efficient implementation, and demonstrates its effectiveness by achieving superior performance in predicting human oral bioavailability compared to standard genetic programming and other machine learning methods.

Very recently new genetic operators, called geometric semantic operators, have been defined for genetic programming. Contrarily to standard genetic operators, which are uniquely based on the syntax of the individuals, these new operators are based on their semantics, meaning with it the set of input-output pairs on training data. Furthermore, these operators present the interesting property of inducing a unimodal fitness landscape for every problem that consists in finding a match between given input and output data (for instance regression and classification). Nevertheless, the current definition of these operators has a serious limitation: they impose an exponential growth in the size of the individuals in the population, so their use is impossible in practice. This paper is intended to overcome this limitation, presenting a new genetic programming system that implements geometric semantic operators in an extremely efficient way. To demonstrate the power of the proposed system, we use it to solve a complex real-life application in the field of pharmacokinetic: the prediction of the human oral bioavailability of potential new drugs. Besides the excellent performances on training data, which were expected because the fitness landscape is unimodal, we also report an excellent generalization ability of the proposed system, at least for the studied application. In fact, it outperforms standard genetic programming and a wide set of other well-known machine learning methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes