LGDec 2, 2022Code
Safe machine learning model release from Trusted Research Environments: The SACRO-ML packageJim Smith, Richard J. Preen, Andrew McCarthy et al.
We present SACRO-ML, an integrated suite of open source Python tools to facilitate the statistical disclosure control (SDC) of machine learning (ML) models trained on confidential data prior to public release. SACRO-ML combines (i) a SafeModel package that extends commonly used ML models to provide ante-hoc SDC by assessing the vulnerability of disclosure posed by the training regime; and (ii) an Attacks package that provides post-hoc SDC by rigorously assessing the empirical disclosure risk of a model through a variety of simulated attacks after training. The SACRO-ML code and documentation are available under an MIT license at https://github.com/AI-SDC/SACRO-ML
LGFeb 13, 2025
A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attackRichard J. Preen, Jim Smith
Machine learning models can inadvertently expose confidential properties of their training data, making them vulnerable to membership inference attacks (MIA). While numerous evaluation methods exist, many require computationally expensive processes, such as training multiple shadow models. This article presents two new complementary approaches for efficiently identifying vulnerable tree-based models: an ante-hoc analysis of hyperparameter choices and a post-hoc examination of trained model structure. While these new methods cannot certify whether a model is safe from MIA, they provide practitioners with a means to significantly reduce the number of models that need to undergo expensive MIA assessment through a hierarchical filtering approach. More specifically, it is shown that the rank order of disclosure risk for different hyperparameter combinations remains consistent across datasets, enabling the development of simple, human-interpretable rules for identifying relatively high-risk models before training. While this ante-hoc analysis cannot determine absolute safety since this also depends on the specific dataset, it allows the elimination of unnecessarily risky configurations during hyperparameter tuning. Additionally, computationally inexpensive structural metrics serve as indicators of MIA vulnerability, providing a second filtering stage to identify risky models after training but before conducting expensive attacks. Empirical results show that hyperparameter-based risk prediction rules can achieve high accuracy in predicting the most at risk combinations of hyperparameters across different tree-based model types, while requiring no model training. Moreover, target model accuracy is not seen to correlate with privacy risk, suggesting opportunities to optimise model configurations for both performance and privacy.
NEMar 1, 2021
Deep Learning with a Classifier System: Initial ResultsRichard J. Preen, Larry Bull
This article presents the first results from using a learning classifier system capable of performing adaptive computation with deep neural networks. Individual classifiers within the population are composed of two neural networks. The first acts as a gating or guarding component, which enables the conditional computation of an associated deep neural network on a per instance basis. Self-adaptive mutation is applied upon reproduction and prediction networks are refined with stochastic gradient descent during lifetime learning. The use of fully-connected and convolutional layers are evaluated on handwritten digit recognition tasks where evolution adapts (i) the gradient descent learning rate applied to each layer (ii) the number of units within each layer, i.e., the number of fully-connected neurons and the number of convolutional kernel filters (iii) the connectivity of each layer, i.e., whether each weight is active (iv) the weight magnitudes, enabling escape from local optima. The system automatically reduces the number of weights and units while maintaining performance after achieving a maximum prediction error.
NEOct 23, 2019
Autoencoding with a Classifier SystemRichard J. Preen, Stewart W. Wilson, Larry Bull
Autoencoders are data-specific compression algorithms learned automatically from examples. The predominant approach has been to construct single large global models that cover the domain. However, training and evaluating models of increasing size comes at the price of additional time and computational cost. Conditional computation, sparsity, and model pruning techniques can reduce these costs while maintaining performance. Learning classifier systems (LCS) are a framework for adaptively subdividing input spaces into an ensemble of simpler local approximations that together cover the domain. LCS perform conditional computation through the use of a population of individual gating/guarding components, each associated with a local approximation. This article explores the use of an LCS to adaptively decompose the input domain into a collection of small autoencoders where local solutions of different complexity may emerge. In addition to benefits in convergence time and computational cost, it is shown possible to reduce code size as well as the resulting decoder computational cost when compared with the global model equivalent.
NEDec 19, 2018
Towards an Evolvable Cancer Treatment SimulatorRichard J. Preen, Larry Bull, Andrew Adamatzky
The use of high-fidelity computational simulations promises to enable high-throughput hypothesis testing and optimisation of cancer therapies. However, increasing realism comes at the cost of increasing computational requirements. This article explores the use of surrogate-assisted evolutionary algorithms to optimise the targeted delivery of a therapeutic compound to cancerous tumour cells with the multicellular simulator, PhysiCell. The use of both Gaussian process models and multi-layer perceptron neural network surrogate models are investigated. We find that evolutionary algorithms are able to effectively explore the parameter space of biophysical properties within the agent-based simulations, minimising the resulting number of cancerous cells after a period of simulated treatment. Both model-assisted algorithms are found to outperform a standard evolutionary algorithm, demonstrating their ability to perform a more effective search within the very small evaluation budget. This represents the first use of efficient evolutionary algorithms within a high-throughput multicellular computing approach to find therapeutic design optima that maximise tumour regression.
NEMar 25, 2018
Evolutionary n-level Hypergraph Partitioning with Adaptive CoarseningRichard J. Preen, Jim Smith
Hypergraph partitioning is an NP-hard problem that occurs in many computer science applications where it is necessary to reduce large problems into a number of smaller, computationally tractable sub-problems. Current techniques use a multilevel approach wherein an initial partitioning is performed after compressing the hypergraph to a predetermined level. This level is typically chosen to produce very coarse hypergraphs in which heuristic algorithms are fast and effective. This article presents a novel memetic algorithm which remains effective on larger initial hypergraphs. This enables the exploitation of information that can be lost during coarsening and results in improved final solution quality. We use this algorithm to present an empirical analysis of the space of possible initial hypergraphs in terms of its searchability at different levels of coarsening. We find that the best results arise at coarsening levels unique to each hypergraph. Based on this, we introduce an adaptive scheme that stops coarsening when the rate of information loss in a hypergraph becomes non-linear and show that this produces further improvements. The results show that we have identified a valuable role for evolutionary algorithms within the current state-of-the-art hypergraph partitioning framework.
NEOct 18, 2016
Design Mining Microbial Fuel Cell CascadesRichard J. Preen, Jiseon You, Larry Bull et al.
Microbial fuel cells (MFCs) perform wastewater treatment and electricity production through the conversion of organic matter using microorganisms. For practical applications, it has been suggested that greater efficiency can be achieved by arranging multiple MFC units into physical stacks in a cascade with feedstock flowing sequentially between units. In this paper, we investigate the use of computational intelligence to physically explore and optimise (potentially) heterogeneous MFC designs in a cascade, i.e. without simulation. Conductive structures are 3-D printed and inserted into the anodic chamber of each MFC unit, augmenting a carbon fibre veil anode and affecting the hydrodynamics, including the feedstock volume and hydraulic retention time, as well as providing unique habitats for microbial colonisation. We show that it is possible to use design mining to identify new conductive inserts that increase both the cascade power output and power density.
NEJun 29, 2015
On Design Mining: Coevolution and Surrogate ModelsRichard J. Preen, Larry Bull
Design mining is the use of computational intelligence techniques to iteratively search and model the attribute space of physical objects evaluated directly through rapid prototyping to meet given objectives. It enables the exploitation of novel materials and processes without formal models or complex simulation. In this paper, we focus upon the coevolutionary nature of the design process when it is decomposed into concurrent sub-design threads due to the overall complexity of the task. Using an abstract, tuneable model of coevolution we consider strategies to sample sub-thread designs for whole system testing and how best to construct and use surrogate models within the coevolutionary scenario. Drawing on our findings, the paper then describes the effective design of an array of six heterogeneous vertical-axis wind turbines.
NEOct 2, 2014
Design Mining Interacting Wind TurbinesRichard J. Preen, Larry Bull
An initial study of surrogate-assisted evolutionary algorithms used to design vertical-axis wind turbines wherein candidate prototypes are evaluated under fan generated wind conditions after being physically instantiated by a 3D printer has recently been presented. Unlike other approaches, such as computational fluid dynamics simulations, no mathematical formulations were used and no model assumptions were made. This paper extends that work by exploring alternative surrogate modelling and evolutionary techniques. The accuracy of various modelling algorithms used to estimate the fitness of evaluated individuals from the initial experiments is compared. The effect of temporally windowing surrogate model training samples is explored. A surrogate-assisted approach based on an enhanced local search is introduced; and alternative coevolution collaboration schemes are examined.
NEAug 13, 2013
Toward the Coevolution of Novel Vertical-Axis Wind TurbinesRichard J. Preen, Larry Bull
The production of renewable and sustainable energy is one of the most important challenges currently facing mankind. Wind has made an increasing contribution to the world's energy supply mix, but still remains a long way from reaching its full potential. In this paper, we investigate the use of artificial evolution to design vertical-axis wind turbine prototypes that are physically instantiated and evaluated under fan generated wind conditions. Initially a conventional evolutionary algorithm is used to explore the design space of a single wind turbine and later a cooperative coevolutionary algorithm is used to explore the design space of an array of wind turbines. Artificial neural networks are used throughout as surrogate models to assist learning and found to reduce the number of fabrications required to reach a higher aerodynamic efficiency. Unlike in other approaches, such as computational fluid dynamics simulations, no mathematical formulations are used and no model assumptions are made.
AIApr 18, 2012
Fuzzy Dynamical Genetic Programming in XCSFRichard J. Preen, Larry Bull
A number of representation schemes have been presented for use within Learning Classifier Systems, ranging from binary encodings to Neural Networks, and more recently Dynamical Genetic Programming (DGP). This paper presents results from an investigation into using a fuzzy DGP representation within the XCSF Learning Classifier System. In particular, asynchronous Fuzzy Logic Networks are used to represent the traditional condition-action production system rules. It is shown possible to use self-adaptive, open-ended evolution to design an ensemble of such fuzzy dynamical systems within XCSF to solve several well-known continuous-valued test problems.
AIApr 18, 2012
Discrete Dynamical Genetic Programming in XCSRichard J. Preen, Larry Bull
A number of representation schemes have been presented for use within Learning Classifier Systems, ranging from binary encodings to neural networks. This paper presents results from an investigation into using a discrete dynamical system representation within the XCS Learning Classifier System. In particular, asynchronous random Boolean networks are used to represent the traditional condition-action production system rules. It is shown possible to use self-adaptive, open-ended evolution to design an ensemble of such discrete dynamical systems within XCS to solve a number of well-known test problems.
NEApr 18, 2012
Towards the Evolution of Vertical-Axis Wind Turbines using SupershapesRichard J. Preen, Larry Bull
We have recently presented an initial study of evolutionary algorithms used to design vertical-axis wind turbines (VAWTs) wherein candidate prototypes are evaluated under approximated wind tunnel conditions after being physically instantiated by a 3D printer. That is, unlike other approaches such as computational fluid dynamics simulations, no mathematical formulations are used and no model assumptions are made. However, the representation used significantly restricted the range of morphologies explored. In this paper, we present initial explorations into the use of a simple generative encoding, known as Gielis superformula, that produces a highly flexible 3D shape representation to design VAWT. First, the target-based evolution of 3D artefacts is investigated and subsequently initial design experiments are performed wherein each VAWT candidate is physically instantiated and evaluated under approximated wind tunnel conditions. It is shown possible to produce very closely matching designs of a number of 3D objects through the evolution of supershapes produced by Gielis superformula. Moreover, it is shown possible to use artificial physical evolution to identify novel and increasingly efficient supershape VAWT designs.
AIJan 26, 2012
Discrete and fuzzy dynamical genetic programming in the XCSF learning classifier systemRichard J. Preen, Larry Bull
A number of representation schemes have been presented for use within learning classifier systems, ranging from binary encodings to neural networks. This paper presents results from an investigation into using discrete and fuzzy dynamical system representations within the XCSF learning classifier system. In particular, asynchronous random Boolean networks are used to represent the traditional condition-action production system rules in the discrete case and asynchronous fuzzy logic networks in the continuous-valued case. It is shown possible to use self-adaptive, open-ended evolution to design an ensemble of such dynamical systems within XCSF to solve a number of well-known test problems.