NEApr 11, 2022
Effective Mutation Rate Adaptation through Group Elite SelectionAkarsh Kumar, Bo Liu, Risto Miikkulainen et al.
Evolutionary algorithms are sensitive to the mutation rate (MR); no single value of this parameter works well across domains. Self-adaptive MR approaches have been proposed but they tend to be brittle: Sometimes they decay the MR to zero, thus halting evolution. To make self-adaptive MR robust, this paper introduces the Group Elite Selection of Mutation Rates (GESMR) algorithm. GESMR co-evolves a population of solutions and a population of MRs, such that each MR is assigned to a group of solutions. The resulting best mutational change in the group, instead of average mutational change, is used for MR selection during evolution, thus avoiding the vanishing MR problem. With the same number of function evaluations and with almost no overhead, GESMR converges faster and to better solutions than previous approaches on a wide range of continuous test optimization problems. GESMR also scales well to high-dimensional neuroevolution for supervised image-classification tasks and for reinforcement learning control tasks. Remarkably, GESMR produces MRs that are optimal in the long-term, as demonstrated through a comprehensive look-ahead grid search. Thus, GESMR and its theoretical and empirical analysis demonstrate how self-adaptation can be harnessed to improve performance in several applications of evolutionary computation.
LGJan 13, 2023
Efficient Activation Function Optimization through Surrogate ModelingGarrett Bingham, Risto Miikkulainen
Carefully designed activation functions can improve the performance of neural networks in many machine learning tasks. However, it is difficult for humans to construct optimal activation functions, and current activation function search algorithms are prohibitively expensive. This paper aims to improve the state of the art through three steps: First, the benchmark datasets Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions. Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization. More specifically, the spectrum of the Fisher information matrix associated with the model's predictive distribution at initialization and the activation function's output distribution were found to be highly predictive of performance. Third, the surrogate was used to discover improved activation functions in several real-world tasks, with a surprising finding: a sigmoidal design that outperformed all other activation functions was discovered, challenging the status quo of always using rectifier nonlinearities in deep learning. Each of these steps is a contribution in its own right; together they serve as a practical and theoretical foundation for further research on activation function optimization.
AIApr 21, 2022
EVOTER: Evolution of Transparent Explainable Rule-setsHormoz Shahrzad, Babak Hodjat, Risto Miikkulainen
Most AI systems are black boxes generating reasonable outputs for given inputs. Some domains, however, have explainability and trustworthiness requirements that cannot be directly met by these approaches. Various methods have therefore been developed to interpret black-box models after training. This paper advocates an alternative approach where the models are transparent and explainable to begin with. This approach, EVOTER, evolves rule-sets based on simple logical expressions. The approach is evaluated in several prediction/classification and prescription/policy search domains with and without a surrogate. It is shown to discover meaningful rule sets that perform similarly to black-box models. The rules can provide insight into the domain, and make biases hidden in the data explicit. It may also be possible to edit them directly to remove biases and add constraints. EVOTER thus forms a promising foundation for building trustworthy AI systems for real-world applications in the future.
NEOct 25, 2022
Shortest Edit Path Crossover: A Theory-driven Solution to the Permutation Problem in Evolutionary Neural Architecture SearchXin Qiu, Risto Miikkulainen
Population-based search has recently emerged as a possible alternative to Reinforcement Learning (RL) for black-box neural architecture search (NAS). It performs well in practice even though it is not theoretically well understood. In particular, whereas traditional population-based search methods such as evolutionary algorithms (EAs) draw much power from crossover operations, it is difficult to take advantage of them in NAS. The main obstacle is believed to be the permutation problem: The mapping between genotype and phenotype in traditional graph representations is many-to-one, leading to a disruptive effect of standard crossover. This paper presents the first theoretical analysis of the behaviors of mutation, crossover and RL in black-box NAS, and proposes a new crossover operator based on the shortest edit path (SEP) in graph space. The SEP crossover is shown theoretically to overcome the permutation problem, and as a result, have a better expected improvement compared to mutation, standard crossover and RL. Further, it empirically outperform these other methods on state-of-the-art NAS benchmarks. The SEP crossover therefore allows taking full advantage of population-based search in NAS, and the underlying theory can serve as a foundation for deeper understanding of black-box NAS methods in general.
LGMay 28
Overcoming Forgetting in LLM Fine-Tuning with Evolution StrategiesKajetan Schweighofer, Conor F. Hayes, Roberto Dailey et al.
Evolution Strategies (ES) has recently emerged as a competitive alternative to reinforcement learning (RL) for large language model (LLM) fine-tuning, offering advantages through simplicity, scalability, and inference-only training. However, recent work suggests that ES fine-tuning on new tasks may induce forgetting of prior tasks. First, this paper shows that prior task forgetting (1) is better characterized as performance drift rather than irreversible forgetting, with prior-task performance often recovering during ES training; and (2) is not a specific failure mode of ES, but can also arise for fine-tuning with RL methods. Second, it analyzes when and why such drift arises, highlighting its dependence on ES training dynamics, particularly random walk behavior in weakly constrained directions of the weight space. Third, based on these insights, it introduces Anchored Weight Decay (AWD) as a parameter-space regularization technique that constrains optimization toward the initial model parameters. AWD effectively stabilizes prior-task performance while preserving target-task performance, achieving benefits comparable to large ES population sizes at much lower computational cost. Thus, contrary to previous beliefs, the paper shows that prior-task forgetting under ES is largely avoidable, positioning ES as a promising approach for continual learning in LLMs.
NEAug 8, 2023
Asynchronous Evolution of Deep Neural Network ArchitecturesJason Liang, Hormoz Shahrzad, Risto Miikkulainen
Many evolutionary algorithms (EAs) take advantage of parallel evaluation of candidates. However, if evaluation times vary significantly, many worker nodes (i.e.,\ compute clients) are idle much of the time, waiting for the next generation to be created. Evolutionary neural architecture search (ENAS), a class of EAs that optimizes the architecture and hyperparameters of deep neural networks, is particularly vulnerable to this issue. This paper proposes a generic asynchronous evaluation strategy (AES) that is then adapted to work with ENAS. AES increases throughput by maintaining a queue of up to $K$ individuals ready to be sent to the workers for evaluation and proceeding to the next generation as soon as $M<<K$ individuals have been evaluated. A suitable value for $M$ is determined experimentally, balancing diversity and efficiency. To showcase the generality and power of AES, it was first evaluated in eight-line sorting network design (a single-population optimization task with limited evaluation-time variability), achieving an over two-fold speedup. Next, it was evaluated in 11-bit multiplexer design (a single-population discovery task with extended variability), where a 14-fold speedup was observed. It was then scaled up to ENAS for image captioning (a multi-population open-ended-optimization task), resulting in an over two-fold speedup. In all problems, a multifold performance improvement was observed, suggesting that AES is a promising method for parallelizing the evolution of complex systems with long and variable evaluation times, such as those in ENAS.
NEJun 18, 2023
Evolving Strategies for Competitive Multi-Agent SearchErkin Bahceci, Riitta Katila, Risto Miikkulainen
While evolutionary computation is well suited for automatic discovery in engineering, it can also be used to gain insight into how humans and organizations could perform more effectively. Using a real-world problem of innovation search in organizations as the motivating example, this article first formalizes human creative problem solving as competitive multi-agent search (CMAS). CMAS is different from existing single-agent and team search problems in that the agents interact through knowledge of other agents' searches and through the dynamic changes in the search landscape that result from these searches. The main hypothesis is that evolutionary computation can be used to discover effective strategies for CMAS; this hypothesis is verified in a series of experiments on the NK model, i.e.\ partially correlated and tunably rugged fitness landscapes. Different specialized strategies are evolved for each different competitive environment, and also general strategies that perform well across environments. These strategies are more effective and more complex than hand-designed strategies and a strategy based on traditional tree search. Using a novel spherical visualization of such landscapes, insight is gained about how successful strategies work, e.g.\ by tracking positive changes in the landscape. The article thus provides a possible framework for studying various human creative activities as competitive multi-agent search in the future.
LGMay 27
Efficient Pre-Training of LLMs through Truncated SVD LayersKaivan Kamali, Kajetan Schweighofer, Hormoz Shahrzad et al.
The massive scaling of Large Language Models (LLMs) has made pretraining increasingly cost-prohibitive. While low-rank representation and orthonormal weight matrices could in principle reduce parameter counts and computational overhead, most existing methods rely on static rank selection and do not enforce weight orthonormality due to high computational cost. This paper introduces TSVD, a framework that maintains low rank and strict orthonormality throughout the training process. It utilizes a spectral energy-based heuristic for adaptive rank selection, and a caching mechanisms to maintain orthonormality. Theoretical analysis justifies the advantage of the approach in pretraining dynamics and experiments across various model scales demonstrate that it is effective empirically. TSVD matches or exceeds the performance of full-parameter baselines while significantly reducing compute requirements. The approach thus offers a well-founded, practical, and scalable path toward efficient high-performance LLM pretraining.
NEMar 14, 2022
DIAS: A Domain-Independent Alife-Based Problem-Solving SystemBabak Hodjat, Hormoz Shahrzad, Risto Miikkulainen
A domain-independent problem-solving system based on principles of Artificial Life is introduced. In this system, DIAS, the input and output dimensions of the domain are laid out in a spatial medium. A population of actors, each seeing only part of this medium, solves problems collectively in it. The process is independent of the domain and can be implemented through different kinds of actors. Through a set of experiments on various problem domains, DIAS is shown able to solve problems with different dimensionality and complexity, to require no hyperparameter tuning for new problems, and to exhibit lifelong learning, i.e. adapt rapidly to run-time changes in the problem domain, and do it better than a standard non-collective approach. DIAS therefore demonstrates a role for Alife in building scalable, general, and adaptive problem-solving systems.
LGJan 30
The Blessing of Dimensionality in LLM Fine-tuning: A Variance-Curvature PerspectiveQiyao Liang, Jinyeop Song, Yizhou Liu et al.
Weight-perturbation evolution strategies (ES) can fine-tune billion-parameter language models with surprisingly small populations (e.g., $N\!\approx\!30$), contradicting classical zeroth-order curse-of-dimensionality intuition. We also observe a second seemingly separate phenomenon: under fixed hyperparameters, the stochastic fine-tuning reward often rises, peaks, and then degrades in both ES and GRPO. We argue that both effects reflect a shared geometric property of fine-tuning landscapes: they are low-dimensional in curvature. A small set of high-curvature dimensions dominates improvement, producing (i) heterogeneous time scales that yield rise-then-decay under fixed stochasticity, as captured by a minimal quadratic stochastic-ascent model, and (ii) degenerate improving updates, where many random perturbations share similar components along these directions. Using ES as a geometric probe on fine-tuning reward landscapes of GSM8K, ARC-C, and WinoGrande across Qwen2.5-Instruct models (0.5B--7B), we show that reward-improving perturbations remain empirically accessible with small populations across scales. Together, these results reconcile ES scalability with non-monotonic training dynamics and suggest that high-dimensional fine-tuning may admit a broader class of viable optimization methods than worst-case theory implies.
LGFeb 3Code
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision CostYinggan Xu, Risto Miikkulainen, Xin Qiu
Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and high-precision weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision gradient signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning method on arithmetic reasoning tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .
AIMay 7
Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident HallucinationQiyao Liang, Risto Miikkulainen, Ila Fiete
Language models draw on two knowledge sources: facts baked into weights (parametric memory, PM) and information in context (working memory, WM). We study two mechanistically distinct failure modes--conflict, when PM and WM disagree and interfere; and hallucination, when the queried fact was never learned. Both produce confident output regardless, making output-based monitoring blind by design. We show both failures share a unified geometric account. In the hidden-state space of autoregressive generation, learned facts form attractor basins. Conflict is basin competition: WM disrupts convergence to the correct basin without raising output entropy. Hallucination is basin absence: the hidden state drifts freely when no memorized basin exists. The frozen LM head, designed for next-token prediction, cannot distinguish these cases and fires confidently either way. We verify this account in a controlled synthetic task--entity identifiers mapped to unique codes with PM installed via LoRA adapters--where ground truth is exact and component roles can be causally isolated through targeted adapter placement. Geometric margin--the hidden state's distance to the nearest memorized basin--reads this geometry directly and separates correct recall from hallucination far more cleanly than output entropy, with zero false refusals where entropy-based detection cannot avoid rejecting the vast majority of correct outputs. The separation holds on natural-language factual queries from the pretrained model with no adaptation, confirming attractor geometry is structural rather than a fine-tuning artifact. The fraction of confident hallucinations follows a scaling law $C = \exp(-c/\barΔ)$, growing with scale even as overall error rates fall. Hidden states reliably encode epistemic state; the frozen output head systematically erases it--and this erasure worsens with scale.
NENov 21, 2023
Discovering Effective Policies for Land-Use Planning with NeuroevolutionDaniel Young, Olivier Francon, Elliot Meyerson et al.
How areas of land are allocated for different uses, such as forests, urban areas, and agriculture, has a large effect on the terrestrial carbon balance, and therefore climate change. Based on available historical data on land-use changes and a simulation of the associated carbon emissions and removals, a surrogate model can be learned that makes it possible to evaluate the different options available to decision-makers efficiently. An evolutionary search process can then be used to discover effective land-use policies for specific locations. Such a system was built on the Project Resilience platform and evaluated with the Land-Use Harmonization dataset LUH2 and the bookkeeping model BLUE. It generates Pareto fronts that trade off carbon impact and amount of land-use change customized to different locations, thus providing a proof-of-concept tool that is potentially useful for land-use planning.
IRFeb 24
Caesar: Deep Agentic Web Exploration for Creative Answer SynthesisJason Liang, Elliot Meyerson, Risto Miikkulainen
To advance from passive retrieval to creative discovery of new ideas, autonomous agents must be capable of deep, associative synthesis. However, current agentic frameworks prioritize convergent search, often resulting in derivative summaries that lack creativity. Caesar is an agentic LLM architecture designed to bridge the gap between information gathering and synthesis of new insights. Unlike existing agents that treat the web as a flat sequence of disconnected documents, Caesar leverages an extensive knowledge graph to foster associative reasoning, thus enabling the discovery of non-obvious connections between disparate concepts. It consists of two components: (1) exploration driven by a dynamic context-aware policy, and (2) synthesis controlled by an adversarial draft refinement loop that actively seeks novel perspectives rather than confirming established priors. Caesar demonstrates the ability to generate artifacts and answers characterized by high novelty and structural coherence, significantly outperforming state-of-the-art LLM research agents in tasks requiring creativity.
AINov 12, 2025
Solving a Million-Step LLM Task with Zero ErrorsElliot Meyerson, Giuseppe Paolo, Roberto Dailey et al.
LLMs have achieved remarkable breakthroughs in reasoning, insights, and tool use, but chaining these abilities into extended processes at the scale of those routinely executed by humans, organizations, and societies has remained out of reach. The models have a persistent error rate that prevents scale-up: for instance, recent experiments in the Towers of Hanoi benchmark domain showed that the process inevitably becomes derailed after at most a few hundred steps. Thus, although LLM research is often still benchmarked on tasks with relatively few dependent logical steps, there is increasing attention on the ability (or inability) of LLMs to perform long range tasks. This paper describes MAKER, the first system that successfully solves a task with over one million LLM steps with zero errors, and, in principle, scales far beyond this level. The approach relies on an extreme decomposition of a task into subtasks, each of which can be tackled by focused microagents. The high level of modularity resulting from the decomposition allows error correction to be applied at each step through an efficient multi-agent voting scheme. This combination of extreme decomposition and error correction makes scaling possible. Thus, the results suggest that instead of relying on continual improvement of current LLMs, massively decomposed agentic processes (MDAPs) may provide a way to efficiently solve problems at the level of organizations and societies.
MAMar 6
TerraLingua: Emergence and Analysis of Open-endedness in LLM EcologiesGiuseppe Paolo, Jamieson Warner, Hormoz Shahrzad et al.
As autonomous agents increasingly operate in real-world digital ecosystems, understanding how they coordinate, form institutions, and accumulate shared culture becomes both a scientific and practical priority. This paper introduces TerraLingua, a persistent multi-agent ecology designed to study open-ended dynamics in such systems. Unlike prior large language model simulations with static or consequence-free environments, TerraLingua imposes resource constraints and limited lifespans for the agents. As a result, agents create artifacts that persist beyond individuals, shaping future interactions and selection pressures. To characterize the dynamics, an AI Anthropologist systematically analyzes agent behavior, group structure, and artifact evolution. Across experimental conditions, the results reveal the emergence of cooperative norms, division of labor, governance attempts, and branching artifact lineages consistent with cumulative cultural processes. Divergent outcomes across experimental runs can be traced back to specific innovations and organizational structures. TerraLingua thus provides a platform for characterizing the mechanisms of cumulative culture and social organization in artificial populations, and can serve as a foundation for guiding real-world agentic populations to socially beneficial outcomes.
NEMar 30
Evolution With Purpose: Hierarchy-Informed Optimization of Whole-Brain ModelsHormoz Shahrzad, Niharika Gajawelli, Kaitlin Maile et al.
Evolutionary search is well suited for large-scale biophysical brain modeling, where many parameters with nonlinear interactions and no tractable gradients need to be optimized. Standard evolutionary approaches achieve an excellent fit to MRI data; however, among many possible such solutions, it finds ones that overfit to individual subjects and provide limited predictive power. This paper investigates whether guiding evolution with biological knowledge can help. Focusing on whole-brain Dynamic Mean Field (DMF) models, a baseline where 20 parameters were shared across the brain was compared against a heterogeneous formulation where different sets of 20 parameters were used for the seven canonical brain regions. The heterogeneous model was optimized using four strategies: optimizing all parameters at once, a curricular approach following the hierarchy of brain networks (HICO), a reversed curricular approach, and a randomly shuffled curricular approach. While all heterogeneous strategies fit the data well, only curricular approaches generalized to new subjects. Most importantly, only HICO made it possible to use the parameter sets to predict the subjects' behavioral abilities as well. Thus, by guiding evolution with biological knowledge about the hierarchy of brain regions, HICO demonstrated how domain knowledge can be harnessed to serve the purpose of optimization in real-world domains.
LGSep 29, 2025Code
Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement LearningXin Qiu, Yulu Gan, Conor F. Hayes et al. · pku
Fine-tuning pre-trained large language models (LLMs) for down-stream tasks is a critical step in the AI deployment pipeline. Reinforcement learning (RL) is arguably the most prominent fine-tuning method, contributing to the birth of many state-of-the-art LLMs. In contrast, evolution strategies (ES), which once showed comparable performance to RL on models with a few million parameters, was neglected due to the pessimistic perception of its scalability to larger models. In this work, we report the first successful attempt to scale up ES for fine-tuning the full parameters of LLMs, showing the surprising fact that ES can search efficiently over billions of parameters and outperform existing RL fine-tuning methods in multiple respects, including sample efficiency, tolerance to long-horizon rewards, robustness to different base LLMs, less tendency to reward hacking, and more stable performance across runs. It therefore serves as a basis to unlock a new direction in LLM fine-tuning beyond what current RL techniques provide. The source codes are provided at: https://github.com/VsonicV/es-fine-tuning-paper.
NEApr 11
Optimizing Chlorination in Water Distribution Systems via Surrogate-assisted NeuroevolutionRivaaj Monsia, Daniel Young, Olivier Francon et al.
Ensuring the microbiological safety of large, heterogeneous water distribution systems (WDS) typically requires managing appropriate levels of disinfectant residuals including chlorine. WDS include complex fluid interactions that are nonlinear and noisy, making such maintenance a challenging problem for traditional control algorithms. This paper proposes an evolutionary framework to this problem based on neuroevolution, multi-objective optimization, and surrogate modeling. Neural networks were evolved with NEAT to inject chlorine at strategic locations in the distribution network at select times. NSGA-II was employed to optimize four objectives: minimizing the total amount of chlorine injected, keeping chlorine concentrations homogeneous across the network, ensuring that maximum concentrations did not exceed safe bounds, and distributing the injections regularly over time. Each network was evaluated against a surrogate model, i.e.\ a neural network trained to emulate EPANET, an industry-level hydraulic WDS simulator that is accurate but infeasible in terms of computational cost to support machine learning. The evolved controllers produced a diverse range of Pareto-optimal policies that could be implemented in practice, outperforming PPO, a standard reinforcement learning method. The results thus suggest a pathway toward improving urban water systems, and highlight the potential of using evolution with surrogate modeling to optimize complex real-world systems.
LGSep 18, 2021Code
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural NetworksGarrett Bingham, Risto Miikkulainen
Neural networks require careful weight initialization to prevent signals from exploding or vanishing. Existing initialization schemes solve this problem in specific cases by assuming that the network has a certain activation function or topology. It is difficult to derive such weight initialization strategies, and modern architectures therefore often use these same initialization schemes even though their assumptions do not hold. This paper introduces AutoInit, a weight initialization algorithm that automatically adapts to different neural network architectures. By analytically tracking the mean and variance of signals as they propagate through the network, AutoInit appropriately scales the weights at each layer to avoid exploding or vanishing signals. Experiments demonstrate that AutoInit improves performance of convolutional, residual, and transformer networks across a range of activation function, dropout, weight decay, learning rate, and normalizer settings, and does so more reliably than data-dependent initialization methods. This flexibility allows AutoInit to initialize models for everything from small tabular tasks to large datasets such as ImageNet. Such generality turns out particularly useful in neural architecture search and in activation function discovery. In these settings, AutoInit initializes each candidate appropriately, making performance evaluations more accurate. AutoInit thus serves as an automatic configuration tool that makes design of new neural network architectures more robust. The AutoInit package provides a wrapper around TensorFlow models and is available at https://github.com/cognizant-ai-labs/autoinit.
CLMay 22, 2024
Semantic Density: Uncertainty Quantification for Large Language Models through Confidence Measurement in Semantic SpaceXin Qiu, Risto Miikkulainen
With the widespread application of Large Language Models (LLMs) to various domains, concerns regarding the trustworthiness of LLMs in safety-critical scenarios have been raised, due to their unpredictable tendency to hallucinate and generate misinformation. Existing LLMs do not have an inherent functionality to provide the users with an uncertainty/confidence metric for each response it generates, making it difficult to evaluate trustworthiness. Although several studies aim to develop uncertainty quantification methods for LLMs, they have fundamental limitations, such as being restricted to classification tasks, requiring additional training and data, considering only lexical instead of semantic information, and being prompt-wise but not response-wise. A new framework is proposed in this paper to address these issues. Semantic density extracts uncertainty/confidence information for each response from a probability distribution perspective in semantic space. It has no restriction on task types and is "off-the-shelf" for new models and tasks. Experiments on seven state-of-the-art LLMs, including the latest Llama 3 and Mixtral-8x22B models, on four free-form question-answering benchmarks demonstrate the superior performance and robustness of semantic density compared to prior approaches.
NEJun 18, 2025
Neural Cellular Automata for ARC-AGIKevin Xu, Risto Miikkulainen
Cellular automata and their differentiable counterparts, Neural Cellular Automata (NCA), are highly expressive and capable of surprisingly complex behaviors. This paper explores how NCAs perform when applied to tasks requiring precise transformations and few-shot generalization, using the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) as a domain that challenges their capabilities in ways not previously explored. Specifically, this paper uses gradient-based training to learn iterative update rules that transform input grids into their outputs from the training examples and apply them to the test inputs. Results suggest that gradient-trained NCA models are a promising and efficient approach to a range of abstract grid-based tasks from ARC. Along with discussing the impacts of various design modifications and training constraints, this work examines the behavior and properties of NCAs applied to ARC to give insights for broader applications of self-organizing systems.
AIFeb 8, 2025
The Odyssey of the Fittest: Can Agents Survive and Still Be Good?Dylan Waldner, Risto Miikkulainen
As AI models grow in power and generality, understanding how agents learn and make decisions in complex environments is critical to promoting ethical behavior. This study introduces the Odyssey, a lightweight, adaptive text based adventure game, providing a scalable framework for exploring AI ethics and safety. The Odyssey examines the ethical implications of implementing biological drives, specifically, self preservation, into three different agents. A Bayesian agent optimized with NEAT, a Bayesian agent optimized with stochastic variational inference, and a GPT 4o agent. The agents select actions at each scenario to survive, adapting to increasingly challenging scenarios. Post simulation analysis evaluates the ethical scores of the agent decisions, uncovering the tradeoffs it navigates to survive. Specifically, analysis finds that when danger increases, agents ethical behavior becomes unpredictable. Surprisingly, the GPT 4o agent outperformed the Bayesian models in both survival and ethical consistency, challenging assumptions about traditional probabilistic methods and raising a new challenge to understand the mechanisms of LLMs' probabilistic reasoning.
AIOct 31, 2024
Unlocking the Potential of Global Human ExpertiseElliot Meyerson, Olivier Francon, Darren Sargent et al.
Solving societal problems on a global scale requires the collection and processing of ideas and methods from diverse sets of international experts. As the number and diversity of human experts increase, so does the likelihood that elements in this collective knowledge can be combined and refined to discover novel and better solutions. However, it is difficult to identify, combine, and refine complementary information in an increasingly large and diverse knowledge base. This paper argues that artificial intelligence (AI) can play a crucial role in this process. An evolutionary AI framework, termed RHEA, fills this role by distilling knowledge from diverse models created by human experts into equivalent neural networks, which are then recombined and refined in a population-based search. The framework was implemented in a formal synthetic domain, demonstrating that it is transparent and systematic. It was then applied to the results of the XPRIZE Pandemic Response Challenge, in which over 100 teams of experts across 23 countries submitted models based on diverse methodologies to predict COVID-19 cases and suggest non-pharmaceutical intervention policies for 235 nations, states, and regions across the globe. Building upon this expert knowledge, by recombining and refining the 169 resulting policy suggestion models, RHEA discovered a broader and more effective set of policies than either AI or human experts alone, as evaluated based on real-world data. The results thus suggest that AI can play a crucial role in realizing the potential of human expertise in global problem-solving.
QMFeb 10, 2024
Optimizing the Design of an Artificial Pancreas to Improve Diabetes ManagementAshok Khanna, Olivier Francon, Risto Miikkulainen
Diabetes, a chronic condition that impairs how the body turns food into energy, i.e. blood glucose, affects 38 million people in the US alone. The standard treatment is to supplement carbohydrate intake with an artificial pancreas, i.e. a continuous insulin pump (basal shots), as well as occasional insulin injections (bolus shots). The goal of the treatment is to keep blood glucose at the center of an acceptable range, as measured through a continuous glucose meter. A secondary goal is to minimize injections, which are unpleasant and difficult for some patients to implement. In this study, neuroevolution was used to discover an optimal strategy for the treatment. Based on a dataset of 30 days of treatment and measurements of a single patient, a random forest was first trained to predict future glucose levels. A neural network was then evolved to prescribe carbohydrates, basal pumping levels, and bolus injections. Evolution discovered a Pareto front that reduced deviation from the target and number of injections compared to the original data, thus improving patients' quality of life. To make the system easier to adopt, a language interface was developed with a large language model. Thus, these technologies not only improve patient care but also adoption in a broader population.
LGMar 13
Optimize Wider, Not Deeper: Consensus Aggregation for Policy OptimizationZelal Su, Mustafaoglu, Sungyoung Lee et al.
Proximal policy optimization (PPO) approximates the trust region update using multiple epochs of clipped SGD. Each epoch may drift further from the natural gradient direction, creating path-dependent noise. To understand this drift, we can use Fisher information geometry to decompose policy updates into signal (the natural gradient projection) and waste (the Fisher-orthogonal residual that consumes trust region budget without first-order surrogate improvement). Empirically, signal saturates but waste grows with additional epochs, creating an optimization-depth dilemma. We propose Consensus Aggregation for Policy Optimization (CAPO), which redirects compute from depth to width: $K$ PPO replicates are optimized on the same batch, differing only in minibatch shuffling order, and then aggregated into a consensus. We study aggregation in two spaces: Euclidean parameter space, and the natural parameter space of the policy distribution via the logarithmic opinion pool. In natural parameter space, the consensus provably achieves higher KL-penalized surrogate and tighter trust region compliance than the mean expert; parameter averaging inherits these guarantees approximately. On continuous control tasks, CAPO outperforms PPO and compute-matched deeper baselines under fixed sample budgets by up to 8.6x. CAPO demonstrates that policy optimization can be improved by optimizing wider, rather than deeper, without additional environment interactions.
NEFeb 2
Fine-Tuning Language Models to Know What They KnowSangjun Park, Elliot Meyerson, Xin Qiu et al.
Metacognition is a critical component of intelligence, specifically regarding the awareness of one's own knowledge. While humans rely on shared internal memory for both answering questions and reporting their knowledge state, this dependency in LLMs remains underexplored. This study proposes a framework to measure metacognitive ability $d_{\rm{type2}}'$ using a dual-prompt method, followed by the introduction of Evolution Strategy for Metacognitive Alignment (ESMA) to bind a model's internal knowledge to its explicit behaviors. ESMA demonstrates robust generalization across diverse untrained settings, indicating a enhancement in the model's ability to reference its own knowledge. Furthermore, parameter analysis attributes these improvements to a sparse set of significant modifications.
NEAug 26, 2025
Leveraging Evolutionary Surrogate-Assisted Prescription in Multi-Objective Chlorination Control SystemsRivaaj Monsia, Olivier Francon, Daniel Young et al.
This short, written report introduces the idea of Evolutionary Surrogate-Assisted Prescription (ESP) and presents preliminary results on its potential use in training real-world agents as a part of the 1st AI for Drinking Water Chlorination Challenge at IJCAI-2025. This work was done by a team from Project Resilience, an organization interested in bridging AI to real-world problems.
LGApr 17, 2025
Evolutionary Policy OptimizationZelal Su "Lain" Mustafaoglu, Keshav Pingali, Risto Miikkulainen
A key challenge in reinforcement learning (RL) is managing the exploration-exploitation trade-off without sacrificing sample efficiency. Policy gradient (PG) methods excel in exploitation through fine-grained, gradient-based optimization but often struggle with exploration due to their focus on local search. In contrast, evolutionary computation (EC) methods excel in global exploration, but lack mechanisms for exploitation. To address these limitations, this paper proposes Evolutionary Policy Optimization (EPO), a hybrid algorithm that integrates neuroevolution with policy gradient methods for policy optimization. EPO leverages the exploration capabilities of EC and the exploitation strengths of PG, offering an efficient solution to the exploration-exploitation dilemma in RL. EPO is evaluated on the Atari Pong and Breakout benchmarks. Experimental results show that EPO improves both policy quality and sample efficiency compared to standard PG and EC methods, making it effective for tasks that require both exploration and local optimization.
CPJan 29, 2025
Transformer Based Time-Series Forecasting for StockShuozhe Li, Zachery B Schulwol, Risto Miikkulainen
To the naked eye, stock prices are considered chaotic, dynamic, and unpredictable. Indeed, it is one of the most difficult forecasting tasks that hundreds of millions of retail traders and professional traders around the world try to do every second even before the market opens. With recent advances in the development of machine learning and the amount of data the market generated over years, applying machine learning techniques such as deep learning neural networks is unavoidable. In this work, we modeled the task as a multivariate forecasting problem, instead of a naive autoregression problem. The multivariate analysis is done using the attention mechanism via applying a mutated version of the Transformer, "Stockformer", which we created.
NEFeb 19, 2022
Simple Genetic Operators are Universal Approximators of Probability Distributions (and other Advantages of Expressive Encodings)Elliot Meyerson, Xin Qiu, Risto Miikkulainen
This paper characterizes the inherent power of evolutionary algorithms. This power depends on the computational properties of the genetic encoding. With some encodings, two parents recombined with a simple crossover operator can sample from an arbitrary distribution of child phenotypes. Such encodings are termed \emph{expressive encodings} in this paper. Universal function approximators, including popular evolutionary substrates of genetic programming and neural networks, can be used to construct expressive encodings. Remarkably, this approach need not be applied only to domains where the phenotype is a function: Expressivity can be achieved even when optimizing static structures, such as binary vectors. Such simpler settings make it possible to characterize expressive encodings theoretically: Across a variety of test problems, expressive encodings are shown to achieve up to super-exponential convergence speed-ups over the standard direct encoding. The conclusion is that, across evolutionary computation areas as diverse as genetic programming, neuroevolution, genetic algorithms, and theory, expressive encodings can be a key to understanding and realizing the full power of evolution.
AIOct 26, 2021
NeuroBack: Improving CDCL SAT Solving using Graph Neural NetworksWenxi Wang, Yang Hu, Mohit Tiwari et al.
Propositional satisfiability (SAT) is an NP-complete problem that impacts many research fields, such as planning, verification, and security. Mainstream modern SAT solvers are based on the Conflict-Driven Clause Learning (CDCL) algorithm. Recent work aimed to enhance CDCL SAT solvers using Graph Neural Networks (GNNs). However, so far this approach either has not made solving more effective, or required substantial GPU resources for frequent online model inferences. Aiming to make GNN improvements practical, this paper proposes an approach called NeuroBack, which builds on two insights: (1) predicting phases (i.e., values) of variables appearing in the majority (or even all) of the satisfying assignments are essential for CDCL SAT solving, and (2) it is sufficient to query the neural model only once for the predictions before the SAT solving starts. Once trained, the offline model inference allows NeuroBack to execute exclusively on the CPU, removing its reliance on GPU resources. To train NeuroBack, a new dataset called DataBack containing 120,286 data samples is created. NeuroBack is implemented as an enhancement to a state-of-the-art SAT solver called Kissat. As a result, it allowed Kissat to solve up to 5.2% and 7.4% more problems on two recent SAT competition problem sets, SATCOMP-2022 and SATCOMP-2023, respectively. NeuroBack therefore shows how machine learning can be harnessed to improve SAT solving in an effective and practical manner.
NEFeb 17, 2021
Evolving GAN Formulations for Higher Quality Image SynthesisSantiago Gonzalez, Mohak Kant, Risto Miikkulainen
Generative Adversarial Networks (GANs) have extended deep learning to complex generation and translation tasks across different data modalities. However, GANs are notoriously difficult to train: Mode collapse and other instabilities in the training process often degrade the quality of the generated results, such as images. This paper presents a new technique called TaylorGAN for improving GANs by discovering customized loss functions for each of its two networks. The loss functions are parameterized as Taylor expansions and optimized through multiobjective evolution. On an image-to-image translation benchmark task, this approach qualitatively improves generated image quality and quantitatively improves two independent GAN performance metrics. It therefore forms a promising approach for applying GANs to more challenging tasks in the future.
NEOct 5, 2020
The Traveling Observer Model: Multi-task Learning Through Spatial Variable EmbeddingsElliot Meyerson, Risto Miikkulainen
This paper frames a general prediction system as an observer traveling around a continuous space, measuring values at some locations, and predicting them at others. The observer is completely agnostic about any particular task being solved; it cares only about measurement locations and their values. This perspective leads to a machine learning framework in which seemingly unrelated tasks can be solved by a single model, by embedding their input and output variables into a shared space. An implementation of the framework is developed in which these variable embeddings are learned jointly with internal model parameters. In experiments, the approach is shown to (1) recover intuitive locations of variables in space and time, (2) exploit regularities across related datasets with completely disjoint input and output spaces, and (3) exploit regularities across seemingly unrelated tasks, outperforming task-specific single-task models and multi-task learning alternatives. The results suggest that even seemingly unrelated tasks may originate from similar underlying processes, a fact that the traveling observer model can use to make better predictions.
LGOct 5, 2020
Detecting Misclassification Errors in Neural Networks with a Gaussian Process ModelXin Qiu, Risto Miikkulainen
As neural network classifiers are deployed in real-world applications, it is crucial that their failures can be detected reliably. One practical solution is to assign confidence scores to each prediction, then use these scores to filter out possible misclassifications. However, existing confidence metrics are not yet sufficiently reliable for this role. This paper presents a new framework that produces a quantitative metric for detecting misclassification errors. This framework, RED, builds an error detector on top of the base classifier and estimates uncertainty of the detection scores using Gaussian Processes. Experimental comparisons with other error detection methods on 125 UCI datasets demonstrate that this approach is effective. Further implementations on two probabilistic base classifiers and two large deep learning architecture in vision tasks further confirm that the method is robust and scalable. Third, an empirical analysis of RED with out-of-distribution and adversarial samples shows that the method can be used not only to detect errors but also to understand where they come from. RED can thereby be used to improve trustworthiness of neural network classifiers more broadly in the future.
LGOct 2, 2020
Effective Regularization Through Loss-Function MetalearningSantiago Gonzalez, Xin Qiu, Risto Miikkulainen
Evolutionary computation can be used to optimize several different aspects of neural network architectures. For instance, the TaylorGLO method discovers novel, customized loss functions, resulting in improved performance, faster training, and improved data utilization. A likely reason is that such functions discourage overfitting, leading to effective regularization. This paper demonstrates theoretically that this is indeed the case for TaylorGLO. Learning rule decomposition reveals that evolved loss functions balance two factors: the pull toward zero error, and a push away from it to avoid overfitting. This is a general principle that may be used to understand other regularization techniques as well (as demonstrated in this paper for label smoothing). The theoretical analysis leads to a constraint that can be utilized to find more effective loss functions in practice; the mechanism also results in networks that are more robust (as demonstrated in this paper with adversarial inputs). The analysis in this paper thus constitutes a first step towards understanding regularization, and demonstrates the power of evolutionary neural architecture search in general.
NEAug 4, 2020
Creative AI Through Evolutionary Computation: Principles and ExamplesRisto Miikkulainen
The main power of artificial intelligence is not in modeling what we already know, but in creating solutions that are new. Such solutions exist in extremely large, high-dimensional, and complex search spaces. Population-based search techniques, i.e. variants of evolutionary computation, are well suited to finding them. These techniques make it possible to find creative solutions to practical problems in the real world, making creative AI through evolutionary computation the likely "next deep learning."
NEJun 18, 2020
Generalization of Agent Behavior through Explicit Representation of ContextCem C Tutum, Suhaib Abdulquddos, Risto Miikkulainen
In order to deploy autonomous agents in digital interactive environments, they must be able to act robustly in unseen situations. The standard machine learning approach is to include as much variation as possible into training these agents. The agents can then interpolate within their training, but they cannot extrapolate much beyond it. This paper proposes a principled approach where a context module is coevolved with a skill module in the game. The context module recognizes the temporal variation in the game and modulates the outputs of the skill module so that the action decisions can be made robustly even in previously unseen situations. The approach is evaluated in the Flappy Bird and LunarLander video games, as well as in the CARLA autonomous driving simulation. The Context+Skill approach leads to significantly more robust behavior in environments that require extrapolation beyond training. Such a principled generalization ability is essential in deploying autonomous agents in real-world tasks, and can serve as a foundation for continual adaptation as well.
LGJun 5, 2020
Discovering Parametric Activation FunctionsGarrett Bingham, Risto Miikkulainen
Recent studies have shown that the choice of activation function can significantly affect the performance of deep learning networks. However, the benefits of novel activation functions have been inconsistent and task dependent, and therefore the rectified linear unit (ReLU) is still the most commonly used. This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance. Evolutionary search is used to discover the general form of the function, and gradient descent to optimize its parameters for different parts of the network and over the learning process. Experiments with four different neural network architectures on the CIFAR-10 and CIFAR-100 image classification datasets show that this approach is effective. It discovers both general activation functions and specialized functions for different architectures, consistently improving accuracy over ReLU and other activation functions by significant margins. The approach can therefore be used as an automated optimization step in applying deep learning to new tasks.
NEMay 28, 2020
From Prediction to Prescription: Evolutionary Optimization of Non-Pharmaceutical Interventions in the COVID-19 PandemicRisto Miikkulainen, Olivier Francon, Elliot Meyerson et al.
Several models have been developed to predict how the COVID-19 pandemic spreads, and how it could be contained with non-pharmaceutical interventions (NPIs) such as social distancing restrictions and school and business closures. This paper demonstrates how evolutionary AI could be used to facilitate the next step, i.e. determining most effective intervention strategies automatically. Through evolutionary surrogate-assisted prescription (ESP), it is possible to generate a large number of candidate strategies and evaluate them with predictive models. In principle, strategies can be customized for different countries and locales, and balance the need to contain the pandemic and the need to minimize their economic impact. While still limited by available data, early experiments suggest that workplace and school restrictions are the most important and need to be designed carefully. It also demonstrates that results of lifting restrictions can be unreliable, and suggests creative ways in which restrictions can be implemented softly, e.g. by alternating them over time. As more data becomes available, the approach can be increasingly useful in dealing with COVID-19 as well as possible future pandemics.
LGFeb 17, 2020
Evolutionary Optimization of Deep Learning Activation FunctionsGarrett Bingham, William Macke, Risto Miikkulainen
The choice of activation function can have a large effect on the performance of a neural network. While there have been some attempts to hand-engineer novel activation functions, the Rectified Linear Unit (ReLU) remains the most commonly-used in practice. This paper shows that evolutionary algorithms can discover novel activation functions that outperform ReLU. A tree-based search space of candidate activation functions is defined and explored with mutation, crossover, and exhaustive search. Experiments on training wide residual networks on the CIFAR-10 and CIFAR-100 image datasets show that this approach is effective. Replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy. Optimal performance is achieved when evolution is allowed to customize activation functions to a particular task; however, these novel activation functions are shown to generalize, achieving high performance across tasks. Evolutionary optimization of activation functions is therefore a promising new dimension of metalearning in neural networks.
NEFeb 13, 2020
Adapting to Unseen Environments through Explicit Representation of ContextCem C. Tutum, Risto Miikkulainen
In order to deploy autonomous agents to domains such as autonomous driving, infrastructure management, health care, and finance, they must be able to adapt safely to unseen situations. The current approach in constructing such agents is to try to include as much variation into training as possible, and then generalize within the possible variations. This paper proposes a principled approach where a context module is coevolved with a skill module. The context module recognizes the variation and modulates the skill module so that the entire system performs well in unseen situations. The approach is evaluated in a challenging version of the Flappy Bird game where the effects of the actions vary over time. The Context+Skill approach leads to significantly more robust behavior in environments with previously unseen effects. Such a principled generalization ability is essential in deploying autonomous agents in real world tasks, and can serve as a foundation for continual learning as well.
NEFeb 13, 2020
Effective Reinforcement Learning through Evolutionary Surrogate-Assisted PrescriptionOlivier Francon, Santiago Gonzalez, Babak Hodjat et al.
There is now significant historical data available on decision making in organizations, consisting of the decision problem, what decisions were made, and how desirable the outcomes were. Using this data, it is possible to learn a surrogate model, and with that model, evolve a decision strategy that optimizes the outcomes. This paper introduces a general such approach, called Evolutionary Surrogate-Assisted Prescription, or ESP. The surrogate is, for example, a random forest or a neural network trained with gradient descent, and the strategy is a neural network that is evolved to maximize the predictions of the surrogate model. ESP is further extended in this paper to sequential decision-making tasks, which makes it possible to evaluate the framework in reinforcement learning (RL) benchmarks. Because the majority of evaluations are done on the surrogate, ESP is more sample efficient, has lower variance, and lower regret than standard RL approaches. Surprisingly, its solutions are also better because both the surrogate and the strategy network regularize the decision-making behavior. ESP thus forms a promising foundation to decision optimization in real-world problems.
NEFeb 11, 2020
Regularized Evolutionary Population-Based TrainingJason Liang, Santiago Gonzalez, Hormoz Shahrzad et al.
Metalearning of deep neural network (DNN) architectures and hyperparameters has become an increasingly important area of research. At the same time, network regularization has been recognized as a crucial dimension to effective training of DNNs. However, the role of metalearning in establishing effective regularization has not yet been fully explored. There is recent evidence that loss-function optimization could play this role, however it is computationally impractical as an outer loop to full training. This paper presents an algorithm called Evolutionary Population-Based Training (EPBT) that interleaves the training of a DNN's weights with the metalearning of loss functions. They are parameterized using multivariate Taylor expansions that EPBT can directly optimize. Such simultaneous adaptation of weights and loss functions can be deceptive, and therefore EPBT uses a quality-diversity heuristic called Novelty Pulsation as well as knowledge distillation to prevent overfitting during training. On the CIFAR-10 and SVHN image classification benchmarks, EPBT results in faster, more accurate learning. The discovered hyperparameters adapt to the training process and serve to regularize the learning task by discouraging overfitting to the labels. EPBT thus demonstrates a practical instantiation of regularization metalearning based on simultaneous training.
LGFeb 9, 2020
Improving Neural Network Learning Through Dual Variable Learning RatesElizabeth Liner, Risto Miikkulainen
This paper introduces and evaluates a novel training method for neural networks: Dual Variable Learning Rates (DVLR). Building on insights from behavioral psychology, the dual learning rates are used to emphasize correct and incorrect responses differently, thereby making the feedback to the network more specific. Further, the learning rates are varied as a function of the network's performance, thereby making it more efficient. DVLR was implemented on three types of networks: feedforward, convolutional, and residual, and two domains: MNIST and CIFAR-10. The results suggest a consistently improved accuracy, demonstrating that DVLR is a promising, psychologically motivated technique for training neural network models.
CRFeb 9, 2020
MDEA: Malware Detection with Evolutionary Adversarial LearningXiruo Wang, Risto Miikkulainen
Malware detection have used machine learning to detect malware in programs. These applications take in raw or processed binary data to neural network models to classify as benign or malicious files. Even though this approach has proven effective against dynamic changes, such as encrypting, obfuscating and packing techniques, it is vulnerable to specific evasion attacks where that small changes in the input data cause misclassification at test time. This paper proposes a new approach: MDEA, an Adversarial Malware Detection model uses evolutionary optimization to create attack samples to make the network robust against evasion attacks. By retraining the model with the evolved malware samples, its performance improves a significant margin.
LGJan 31, 2020
Optimizing Loss Functions Through Multivariate Taylor Polynomial ParameterizationSantiago Gonzalez, Risto Miikkulainen
Metalearning of deep neural network (DNN) architectures and hyperparameters has become an increasingly important area of research. Loss functions are a type of metaknowledge that is crucial to effective training of DNNs, however, their potential role in metalearning has not yet been fully explored. Whereas early work focused on genetic programming (GP) on tree representations, this paper proposes continuous CMA-ES optimization of multivariate Taylor polynomial parameterizations. This approach, TaylorGLO, makes it possible to represent and search useful loss functions more effectively. In MNIST, CIFAR-10, and SVHN benchmark tasks, TaylorGLO finds new loss functions that outperform functions previously discovered through GP, as well as the standard cross-entropy loss, in fewer generations. These functions serve to regularize the learning task by discouraging overfitting to the labels, which is particularly useful in tasks where limited training data is available. The results thus demonstrate that loss function optimization is a productive new avenue for metalearning.
NEJun 7, 2019
Enhanced Optimization with Composite Objectives and Novelty PulsationHormoz Shahrzad, Babak Hodjat, Camille Dollé et al.
An important benefit of multi-objective search is that it maintains a diverse population of candidates, which helps in deceptive problems in particular. Not all diversity is useful, however: candidates that optimize only one objective while ignoring others are rarely helpful. A recent solution is to replace the original objectives by their linear combinations, thus focusing the search on the most useful trade-offs between objectives. To compensate for the loss of diversity, this transformation is accompanied by a selection mechanism that favors novelty. This paper improves this approach further by introducing novelty pulsation, i.e. a systematic method to alternate between novelty selection and local optimization. In the highly deceptive problem of discovering minimal sorting networks, it finds state-of-the-art solutions significantly faster than before. In fact, our method so far has established a new world record for the 20-lines sorting network with 91 comparators. In the real-world problem of stock trading, it discovers solutions that generalize significantly better on unseen data. Composite Novelty Pulsation is therefore a promising approach to solving deceptive real-world problems through multi-objective optimization.
LGJun 3, 2019
Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O KernelXin Qiu, Elliot Meyerson, Risto Miikkulainen
Neural Networks (NNs) have been extensively used for a wide spectrum of real-world regression tasks, where the goal is to predict a numerical outcome such as revenue, effectiveness, or a quantitative result. In many such tasks, the point prediction is not enough: the uncertainty (i.e. risk or confidence) of that prediction must also be estimated. Standard NNs, which are most often used in such tasks, do not provide uncertainty information. Existing approaches address this issue by combining Bayesian models with NNs, but these models are hard to implement, more expensive to train, and usually do not predict as accurately as standard NNs. In this paper, a new framework (RIO) is developed that makes it possible to estimate uncertainty in any pretrained standard NN. The behavior of the NN is captured by modeling its prediction residuals with a Gaussian Process, whose kernel includes both the NN's input and its output. The framework is evaluated in twelve real-world datasets, where it is found to (1) provide reliable estimates of uncertainty, (2) reduce the error of the point predictions, and (3) scale well to large datasets. Given that RIO can be applied to any standard NN without modifications to model architecture or training pipeline, it provides an important ingredient for building real-world NN applications.
LGMay 31, 2019
Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse DomainsElliot Meyerson, Risto Miikkulainen
As deep learning applications continue to become more diverse, an interesting question arises: Can general problem solving arise from jointly learning several such diverse tasks? To approach this question, deep multi-task learning is extended in this paper to the setting where there is no obvious overlap between task architectures. The idea is that any set of (architecture,task) pairs can be decomposed into a set of potentially related subproblems, whose sharing is optimized by an efficient stochastic algorithm. The approach is first validated in a classic synthetic multi-task learning benchmark, and then applied to sharing across disparate architectures for vision, NLP, and genomics tasks. It discovers regularities across these domains, encodes them into sharable modules, and combines these modules systematically to improve performance in the individual tasks. The results confirm that sharing learned functionality across diverse domains and architectures is indeed beneficial, thus establishing a key ingredient for general problem solving in the future.