Improving Generalization Ability of Genetic Programming: Comparative Study
This work addresses generalization issues in GP for empirical modeling, but it is incremental as it builds on existing methods.
The paper tackled the problem of bloat and over-fitting in Genetic Programming (GP) by surveying existing techniques and testing four bloat control methods on six problems, finding that combining double tournament and Tarpeian methods improved results compared to using double tournament alone.
In the field of empirical modeling using Genetic Programming (GP), it is important to evolve solution with good generalization ability. Generalization ability of GP solutions get affected by two important issues: bloat and over-fitting. Bloat is uncontrolled growth of code without any gain in fitness and important issue in GP. We surveyed and classified existing literature related to different techniques used by GP research community to deal with the issue of bloat. Moreover, the classifications of different bloat control approaches and measures for bloat are discussed. Next, we tested four bloat control methods: Tarpeian, double tournament, lexicographic parsimony pressure with direct bucketing and ratio bucketing on six different problems and identified where each bloat control method performs well on per problem basis. Based on the analysis of each method, we combined two methods: double tournament (selection method) and Tarpeian method (works before evaluation) to avoid bloated solutions and compared with the results obtained from individual performance of double tournament method. It was found that the results were improved with this combination of two methods.