13.5NEMay 20
Convergence Analysis of Evolution Strategies for Mixed-Integer OptimizationRyoki Hamano, Kento Uchida, Shinichi Shirakawa
Mixed-integer extensions of evolution strategies (ES) that discretize selected coordinates of sampled continuous vectors often impose a lower bound on the standard deviation of integer variables to prevent premature convergence. While these methods show promising empirical results, this handling can slow the convergence of continuous variables, and its impact has lacked a clear theoretical account. In this paper, we provide a convergence analysis of evolution strategies for mixed-integer optimization, inspired by the drift analysis of the (1+1)-ES in the continuous domain. Specifically, we consider two (1+1)-ES variants for mixed-integer domains: (1+1)-LB-ES, which introduces a lower bound on the standard deviation for integer variables, and (1+1)-LUB-ES, which combines both lower and upper bounds to enhance the convergence of the continuous variables. Focusing on the optimization phase after the integer variables have been optimized, we rigorously analyze their convergence behavior on a benchmark function designed for mixed-integer domains. Our results show that (1+1)-LB-ES can suffer from premature convergence when the number of integer variables is large, while (1+1)-LUB-ES achieves linear convergence under suitable parameter settings. These findings provide theoretical insights into the impact of integer handling on convergence performance and guidance for the design of mixed-integer ES.
42.1NEMay 18
Adaptive Stochastic Natural Gradient Method for Safe Optimization on Binary SpaceKento Uchida, Ryoki Hamano, Masahiro Nomura et al.
Optimization problems in real-world applications across the medical and engineering domains often involve potential risks when evaluating candidate solutions. Safe optimization aims to perform optimization while suppressing unsafe solution evaluations in such situations. For continuous search spaces, there exist safe optimization methods based on evolutionary computation. However, the algorithm development of safe optimization methods for binary search spaces has not been adequately addressed. In this study, we incorporate additional mechanisms for safe optimization into a binary optimization method, the adaptive stochastic natural gradient method (ASNG) with a family of Bernoulli distributions. For safety functions that must be kept non-negative during optimization, the proposed method, safe ASNG, estimates the Lipschitz constants with respect to the Hamming distance by constructing surrogate models of safety functions based on discrete Walsh functions. Then, safe ASNG computes a safe region that consists of safe solutions around the previously evaluated safe solutions. By projecting newly generated solutions to their nearest neighbors within the safe region, safe ASNG suppresses unsafe solution evaluations. Experimental results on benchmark problems on binary domains confirm that, while the comparative methods fail to suppress unsafe solution evaluations, safe ASNG achieves efficient optimization while effectively suppressing unsafe solution evaluations.
5.5LGApr 19
On the Generalization Bounds of Symbolic Regression with Genetic ProgrammingMasahiro Nomura, Ryoki Hamano, Isao Ono
Symbolic regression (SR) with genetic programming (GP) aims to discover interpretable mathematical expressions directly from data. Despite its strong empirical success, the theoretical understanding of why GP-based SR generalizes beyond the training data remains limited. In this work, we provide a learning-theoretic analysis of SR models represented as expression trees. We derive a generalization bound for GP-style SR under constraints on tree size, depth, and learnable constants. Our result decomposes the generalization gap into two interpretable components: a structure-selection term, reflecting the combinatorial complexity of choosing an expression-tree structure, and a constant-fitting term, capturing the complexity of optimizing numerical constants within a fixed structure. This decomposition provides a theoretical perspective on several widely used practices in GP, including parsimony pressure, depth limits, numerically stable operators, and interval arithmetic. In particular, our analysis shows how structural restrictions reduce hypothesis-class growth while stability mechanisms control the sensitivity of predictions to parameter perturbations. By linking these practical design choices to explicit complexity terms in the generalization bound, our work offers a principled explanation for commonly observed empirical behaviors in GP-based SR and contributes towards a more rigorous understanding of its generalization properties.
21.7NEMay 15
Diversified Residual Symbolic RegressionKoki Ikeda, Masahiro Nomura, Ryoki Hamano
Symbolic regression (SR) aims to discover explicit mathematical expressions that explain observed data and is widely used in domains where interpretability is essential. Because interpretability requires expressions to reflect meaningful regularities, SR is sensitive to observations that deviate from the dominant relationship. Such irregular observations, or outliers, are common in real-world data and can hinder SR from identifying underlying regularities. Robust regression mitigates this by downweighting observations with large residuals. However, deciding which observations should be treated as outliers is often ambiguous and depends on user interpretation and domain knowledge, a perspective largely overlooked in existing SR studies. This motivates approaches that present multiple candidate expressions, allowing users to examine different residual patterns and choose expressions consistent with their expertise. We propose diversified residual symbolic regression (DRSR), which achieves high predictive accuracy while promoting diversity with respect to residual patterns based on the Quality-Diversity paradigm. DRSR collects multiple expressions that fit the data well but differ in how residuals are distributed, enabling post-search selection aligned with domain knowledge. On a synthetic mixture dataset, DRSR produces more diverse expressions than conventional SR while capturing multiple underlying relationships. On a real-world astronomical dataset, DRSR discovers multiple expressions consistent with known physical relationships.