Koroush Shirvan

NE
h-index28
6papers
81citations
Novelty44%
AI Score25

6 Papers

NEDec 1, 2021Code
NEORL: NeuroEvolution Optimization with Reinforcement Learning

Majdi I. Radaideh, Katelin Du, Paul Seurin et al.

We present an open-source Python framework for NeuroEvolution Optimization with Reinforcement Learning (NEORL) developed at the Massachusetts Institute of Technology. NEORL offers a global optimization interface of state-of-the-art algorithms in the field of evolutionary computation, neural networks through reinforcement learning, and hybrid neuroevolution algorithms. NEORL features diverse set of algorithms, user-friendly interface, parallel computing support, automatic hyperparameter tuning, detailed documentation, and demonstration of applications in mathematical and real-world engineering optimization. NEORL encompasses various optimization problems from combinatorial, continuous, mixed discrete/continuous, to high-dimensional, expensive, and constrained engineering optimization. NEORL is tested in variety of engineering applications relevant to low carbon energy research in addressing solutions to climate change. The examples include nuclear reactor control and fuel cell power production. The results demonstrate NEORL competitiveness against other algorithms and optimization frameworks in the literature, and a potential tool to solve large-scale optimization problems. More examples and benchmarking of NEORL can be found here: https://neorl.readthedocs.io/en/latest/index.html

LGDec 15, 2023
Multi-Objective Reinforcement Learning-based Approach for Pressurized Water Reactor Optimization

Paul Seurin, Koroush Shirvan

A novel method, the Pareto Envelope Augmented with Reinforcement Learning (PEARL), has been developed to address the challenges posed by multi-objective problems, particularly in the field of engineering where the evaluation of candidate solutions can be time-consuming. PEARL distinguishes itself from traditional policy-based multi-objective Reinforcement Learning methods by learning a single policy, eliminating the need for multiple neural networks to independently solve simpler sub-problems. Several versions inspired from deep learning and evolutionary techniques have been crafted, catering to both unconstrained and constrained problem domains. Curriculum Learning is harnessed to effectively manage constraints in these versions. PEARL's performance is first evaluated on classical multi-objective benchmarks. Additionally, it is tested on two practical PWR core Loading Pattern optimization problems to showcase its real-world applicability. The first problem involves optimizing the Cycle length and the rod-integrated peaking factor as the primary objectives, while the second problem incorporates the mean average enrichment as an additional objective. Furthermore, PEARL addresses three types of constraints related to boron concentration, peak pin burnup, and peak pin power. The results are systematically compared against conventional approaches. Notably, PEARL, specifically the PEARL-NdS variant, efficiently uncovers a Pareto front without necessitating additional efforts from the algorithm designer, as opposed to a single optimization with scaled objectives. It also outperforms the classical approach across multiple performance metrics, including the Hyper-volume.

NEFeb 16, 2024
Surpassing legacy approaches to PWR core reload optimization with single-objective Reinforcement learning

Paul Seurin, Koroush Shirvan

Optimizing the fuel cycle cost through the optimization of nuclear reactor core loading patterns involves multiple objectives and constraints, leading to a vast number of candidate solutions that cannot be explicitly solved. To advance the state-of-the-art in core reload patterns, we have developed methods based on Deep Reinforcement Learning (DRL) for both single- and multi-objective optimization. Our previous research has laid the groundwork for these approaches and demonstrated their ability to discover high-quality patterns within a reasonable time frame. On the other hand, stochastic optimization (SO) approaches are commonly used in the literature, but there is no rigorous explanation that shows which approach is better in which scenario. In this paper, we demonstrate the advantage of our RL-based approach, specifically using Proximal Policy Optimization (PPO), against the most commonly used SO-based methods: Genetic Algorithm (GA), Parallel Simulated Annealing (PSA) with mixing of states, and Tabu Search (TS), as well as an ensemble-based method, Prioritized Replay Evolutionary and Swarm Algorithm (PESA). We found that the LP scenarios derived in this paper are amenable to a global search to identify promising research directions rapidly, but then need to transition into a local search to exploit these directions efficiently and prevent getting stuck in local optima. PPO adapts its search capability via a policy with learnable weights, allowing it to function as both a global and local search method. Subsequently, we compared all algorithms against PPO in long runs, which exacerbated the differences seen in the shorter cases. Overall, the work demonstrates the statistical superiority of PPO compared to the other considered algorithms.

LGMay 9, 2023
Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant Fuel Optimization

Paul Seurin, Koroush Shirvan

The nuclear fuel loading pattern optimization problem belongs to the class of large-scale combinatorial optimization. It is also characterized by multiple objectives and constraints, which makes it impossible to solve explicitly. Stochastic optimization methodologies including Genetic Algorithms and Simulated Annealing are used by different nuclear utilities and vendors, but hand-designed solutions continue to be the prevalent method in the industry. To improve the state-of-the-art, Deep Reinforcement Learning (RL), in particular, Proximal Policy Optimization is leveraged. This work presents a first-of-a-kind approach to utilize deep RL to solve the loading pattern problem and could be leveraged for any engineering design optimization. This paper is also to our knowledge the first to propose a study of the behavior of several hyper-parameters that influence the RL algorithm. The algorithm is highly dependent on multiple factors such as the shape of the objective function derived for the core design that behaves as a fudge factor that affects the stability of the learning. But also, an exploration/exploitation trade-off that manifests through different parameters such as the number of loading patterns seen by the agents per episode, the number of samples collected before a policy update nsteps, and an entropy factor ent_coef that increases the randomness of the policy during training. We found that RL must be applied similarly to a Gaussian Process in which the acquisition function is replaced by a parametrized policy. Then, once an initial set of hyper-parameters is found, reducing nsteps and ent_coef until no more learning is observed will result in the highest sample efficiency robustly and stably. This resulted in an economic benefit of 535,000- 642,000 $/year/plant.

CEApr 17, 2021
Machine learning-assisted surrogate construction for full-core fuel performance analysis

Yifeng Che, Joseph Yurko, Koroush Shirvan

Accurately predicting the behavior of a nuclear reactor requires multiphysics simulation of coupled neutronics, thermal-hydraulics and fuel thermo-mechanics. The fuel thermo-mechanical response provides essential information for operational limits and safety analysis. Traditionally, fuel performance analysis is performed standalone, using calculated spatial-temporal power distribution and thermal boundary conditions from the coupled neutronics-thermal-hydraulics simulation as input. Such one-way coupling is result of the high cost induced by the full-core fuel performance analysis, which provides more realistic and accurate prediction of the core-wide response than the "peak rod" analysis. It is therefore desirable to improve the computational efficiency of full-core fuel performance modeling by constructing fast-running surrogate, such that fuel performance modeling can be utilized in the core reload design optimization. This work presents methodologies for full-core surrogate construction based on several realistic equilibrium PWR core designs. As a fast and conventional approach, look-up tables are only effective for certain fuel performance quantities of interest (QoIs). Several representative machine-learning algorithms are introduced to capture the complicated physics for other fuel performance QoIs. Rule-based model is useful as a feature extraction technique to account for the spatial-temporal complexity of operating conditions. Constructed surrogates achieve at least ten thousand time acceleration with satisfying prediction accuracy. Current work lays foundation for tighter coupling of fuel performance modeling into the core design optimization framework. It also sets stage for full-core fuel performance analysis with BISON where the computational cost becomes more burdensome.

NEAug 10, 2020
Improving Intelligence of Evolutionary Algorithms Using Experience Share and Replay

Majdi I. Radaideh, Koroush Shirvan

We propose PESA, a novel approach combining Particle Swarm Optimisation (PSO), Evolution Strategy (ES), and Simulated Annealing (SA) in a hybrid Algorithm, inspired from reinforcement learning. PESA hybridizes the three algorithms by storing their solutions in a shared replay memory. Next, PESA applies prioritized replay to redistribute data between the three algorithms in frequent form based on their fitness and priority values, which significantly enhances sample diversity and algorithm exploration. Additionally, greedy replay is used implicitly within SA to improve PESA exploitation close to the end of evolution. The validation against 12 high-dimensional continuous benchmark functions shows superior performance by PESA against standalone ES, PSO, and SA, under similar initial starting points, hyperparameters, and number of generations. PESA shows much better exploration behaviour, faster convergence, and ability to find the global optima compared to its standalone counterparts. Given the promising performance, PESA can offer an efficient optimisation option, especially after it goes through additional multiprocessing improvements to handle complex and expensive fitness functions.