HCJan 30, 2024
From Metrics to Meaning: Time to Rethink Evaluation in Human-AI Collaborative DesignSean P. Walton, Ben J. Evans, Alma A. M. Rahat et al.
As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience -- not simply a back end tool.
LGMar 31, 2022
MBORE: Multi-objective Bayesian Optimisation by Density-Ratio EstimationGeorge De Ath, Tinkle Chugh, Alma A. M. Rahat
Optimisation problems often have multiple conflicting objectives that can be computationally and/or financially expensive. Mono-surrogate Bayesian optimisation (BO) is a popular model-based approach for optimising such black-box functions. It combines objective values via scalarisation and builds a Gaussian process (GP) surrogate of the scalarised values. The location which maximises a cheap-to-query acquisition function is chosen as the next location to expensively evaluate. While BO is an effective strategy, the use of GPs is limiting. Their performance decreases as the problem input dimensionality increases, and their computational complexity scales cubically with the amount of data. To address these limitations, we extend previous work on BO by density-ratio estimation (BORE) to the multi-objective setting. BORE links the computation of the probability of improvement acquisition function to that of probabilistic classification. This enables the use of state-of-the-art classifiers in a BO-like framework. In this work we present MBORE: multi-objective Bayesian optimisation by density-ratio estimation, and compare it to BO across a range of synthetic and real-world benchmarks. We find that MBORE performs as well as or better than BO on a wide variety of problems, and that it outperforms BO on high-dimensional and real-world problems.
NEMay 15, 2020
Evaluating Mixed-Initiative Procedural Level Design Tools using a Triple-Blind Mixed-Method User StudySean P. Walton, Alma A. M. Rahat, James Stovold
Results from a triple-blind mixed-method user study into the effectiveness of mixed-initiative tools for the procedural generation of game levels are presented. A tool which generates levels using interactive evolutionary optimisation was designed for this study which (a) is focused on supporting the designer to explore the design space and (b) only requires the designer to interact with it by designing levels. The tool identifies level design patterns in an initial hand-designed map and uses that information to drive an interactive optimisation algorithm. A rigorous user study was designed which compared the experiences of designers using the mixed-initiative tool to designers who were given a tool which provided completely random level suggestions. The designers using the mixed-initiative tool showed an increased engagement in the level design task, reporting that it was effective in inspiring new ideas and design directions. This provides significant evidence that procedural content generation can be used as a powerful tool to support the human design process.
LGFeb 5, 2020
$ε$-shotgun: $ε$-greedy Batch Bayesian OptimisationGeorge De Ath, Richard M. Everson, Jonathan E. Fieldsend et al.
Bayesian optimisation is a popular, surrogate model-based approach for optimising expensive black-box functions. Given a surrogate model, the next location to expensively evaluate is chosen via maximisation of a cheap-to-query acquisition function. We present an $ε$-greedy procedure for Bayesian optimisation in batch settings in which the black-box function can be evaluated multiple times in parallel. Our $ε$-shotgun algorithm leverages the model's prediction, uncertainty, and the approximated rate of change of the landscape to determine the spread of batch solutions to be distributed around a putative location. The initial target location is selected either in an exploitative fashion on the mean prediction, or -- with probability $ε$ -- from elsewhere in the design space. This results in locations that are more densely sampled in regions where the function is changing rapidly and in locations predicted to be good (i.e close to predicted optima), with more scattered samples in regions where the function is flatter and/or of poorer quality. We empirically evaluate the $ε$-shotgun methods on a range of synthetic functions and two real-world problems, finding that they perform at least as well as state-of-the-art batch methods and in many cases exceed their performance.
LGNov 28, 2019
Greed is Good: Exploration and Exploitation Trade-offs in Bayesian OptimisationGeorge De Ath, Richard M. Everson, Alma A. M. Rahat et al.
The performance of acquisition functions for Bayesian optimisation to locate the global optimum of continuous functions is investigated in terms of the Pareto front between exploration and exploitation. We show that Expected Improvement (EI) and the Upper Confidence Bound (UCB) always select solutions to be expensively evaluated on the Pareto front, but Probability of Improvement is not guaranteed to do so and Weighted Expected Improvement does so only for a restricted range of weights. We introduce two novel $ε$-greedy acquisition functions. Extensive empirical evaluation of these together with random search, purely exploratory, and purely exploitative search on 10 benchmark problems in 1 to 10 dimensions shows that $ε$-greedy algorithms are generally at least as effective as conventional acquisition functions (e.g., EI and UCB), particularly with a limited budget. In higher dimensions $ε$-greedy approaches are shown to have improved performance over conventional approaches. These results are borne out on a real world computational fluid dynamics optimisation problem and a robotics active learning problem. Our analysis and experiments suggest that the most effective strategy, particularly in higher dimensions, is to be mostly greedy, occasionally selecting a random exploratory solution.
LGApr 25, 2019
Bayesian Search for Robust OptimaNicholas D. Sanders, Richard M. Everson, Jonathan E. Fieldsend et al.
Many expensive black-box optimisation problems are sensitive to their inputs. In these problems it makes more sense to locate a region of good designs, than a single-possibly fragile-optimal design. Expensive black-box functions can be optimised effectively with Bayesian optimisation, where a Gaussian process is a popular choice as a prior over the expensive function. We propose a method for robust optimisation using Bayesian optimisation to find a region of design space in which the expensive function's performance is relatively insensitive to the inputs whilst retaining a good quality. This is achieved by sampling realisations from a Gaussian process that is modelling the expensive function, and evaluating the improvement for each realisation. The expectation of these improvements can be optimised cheaply with an evolutionary algorithm to determine the next location at which to evaluate the expensive function. We describe an efficient process to locate the optimum expected improvement. We show empirically that evaluating the expensive function at the location in the candidate uncertainty region about which the model is most uncertain, or at random, yield the best convergence in contrast to exploitative schemes. We illustrate our method on six test functions in two, five, and ten dimensions, and demonstrate that it is able to outperform two state-of-the-art approaches from the literature. We also demonstrate our method one two real-world problems in 4 and 8 dimensions, which involve training robot arms to push objects onto targets.