Nick Brown

DC
7papers
73citations
Novelty28%
AI Score38

7 Papers

DCApr 28, 2022
Predicting batch queue job wait times for informed scheduling of urgent HPC workloads

Nick Brown, Gordon Gibb, Evgenij Belikov et al.

There is increasing interest in the use of HPC machines for urgent workloads to help tackle disasters as they unfold. Whilst batch queue systems are not ideal in supporting such workloads, many disadvantages can be worked around by accurately predicting when a waiting job will start to run. However there are numerous challenges in achieving such a prediction with high accuracy, not least because the queue's state can change rapidly and depend upon many factors. In this work we explore a novel machine learning approach for predicting queue wait times, hypothesising that such a model can capture the complex behaviour resulting from the queue policy and other interactions to generate accurate job start times. For ARCHER2 (HPE Cray EX), Cirrus (HPE 8600) and 4-cabinet (HPE Cray EX) we explore how different machine learning approaches and techniques improve the accuracy of our predictions, comparing against the estimation generated by Slurm. We demonstrate that our techniques deliver the most accurate predictions across our machines of interest, with the result of this work being the ability to predict job start times within one minute of the actual start time for around 65\% of jobs on ARCHER2 and 4-cabinet, and 76\% of jobs on Cirrus. When compared against what Slurm can deliver, this represents around 3.8 times better accuracy on ARCHER2 and 18 times better for Cirrus. Furthermore our approach can accurately predicting the start time for three quarters of all job within ten minutes of the actual start time on ARCHER2 and 4-cabinet, and for 90\% of jobs on Cirrus. Whilst the driver of this work has been to better facilitate placement of urgent workloads across HPC machines, the insights gained can be used to provide wider benefits to users and also enrich existing batch queue systems and inform policy too.

16.6DCMar 23
Interactive and Urgent HPC: State of the Research

Albert Reuther, William Arndt, Johannes Blaschke et al.

When we think of how we use smartphones, e-commerce, collaboration platforms, LLMs, etc., most of our interactions with computers are interactive and often urgent. Similar trends of interactivity and urgency are coming to HPC, with applications from simulations to data analysis and machine learning requiring more parallel computational capability and more interactivity. This chapter overviews the progress made so far along with some vectors of what the path forward will bring for greater integration of interactive and urgent HPC policies, techniques, and technologies into our HPC ecosystems.

3.4DCMay 5
Lifting to tensors when compiling scientific computing workloads for AI Engines

Nick Brown, Gabriel Rodriguez-Canal

It has been demonstrated that specialised architectures, such as FPGAs and AMD's AI Engines (AIEs), have the potential to deliver energy and performance advantages for scientific computing. Given the integration of AIEs into AMD's CPUs, this is an interesting potential avenue especially when executing on the edge or making better use of local compute constrained resources. However, a major challenge is in enabling existing codes to run on this architecture without extensive modification. Put simply, it requires significant expertise and time to port codes to the AIE's execution model. In this paper we explore a compilation pipeline for efficiently mapping loops in general purpose, scientific codes to AIEs. Lifting the semantics of an application into tensors, we demonstrate that this is able to capture the intention of general purpose loops annotated with OpenMP and such high-level tensor information provides a richness that is effective when mapping to the AIEs. Requiring only an OpenMP decorated loop, our approach significantly reduces code complexity when targeting the architecture. For six kernel benchmarks, representing AI and scientific computing, using our approach the NPU performs comparatively to the multicore CPU for float32, in all cases at reduced energy to solution. For two scientific computing kernels running across both the CPU and NPU together delivers up to a 40% improvement in performance and 15% reduction in energy usage compared to the CPU alone.

LGOct 17, 2020
Using machine learning to reduce ensembles of geological models for oil and gas exploration

Anna Roubícková, Lucy MacGregor, Nick Brown et al.

Exploration using borehole drilling is a key activity in determining the most appropriate locations for the petroleum industry to develop oil fields. However, estimating the amount of Oil In Place (OIP) relies on computing with a very significant number of geological models, which, due to the ever increasing capability to capture and refine data, is becoming infeasible. As such, data reduction techniques are required to reduce this set down to a smaller, yet still fully representative ensemble. In this paper we explore different approaches to identifying the key grouping of models, based on their most important features, and then using this information select a reduced set which we can be confident fully represent the overall model space. The result of this work is an approach which enables us to describe the entire state space using only 0.5\% of the models, along with a series of lessons learnt. The techniques that we describe are not only applicable to oil and gas exploration, but also more generally to the HPC community as we are forced to work with reduced data-sets due to the rapid increase in data collection capability.

LGOct 4, 2020
Machine Learning for Gas and Oil Exploration

Vito Alexander Nordloh, Anna Roubícková, Nick Brown

Drilling boreholes for gas and oil extraction is an expensive process and profitability strongly depends on characteristics of the subsurface. As profitability is a key success factor, companies in the industry utilise well logs to explore the subsurface beforehand. These well logs contain various characteristics of the rock around the borehole, which allow petrophysicists to determine the expected amount of contained hydrocarbon. However, these logs are often incomplete and, as a consequence, the subsequent analyses cannot exploit the full potential of the well logs. In this paper we demonstrate that Machine Learning can be applied to \emph{fill in the gaps} and estimate missing values. We investigate how the amount of training data influences the accuracy of prediction and how to best design regression models (Gradient Boosting and neural network) to obtain optimal results. We then explore the models' predictions both quantitatively, tracking the prediction error, and qualitatively, capturing the evolution of the measured and predicted values for a given property with depth. Combining the findings has enabled us to develop a predictive model that completes the well logs, increasing their quality and potential commercial value.

GEO-PHOct 1, 2020
Machine learning on Crays to optimise petrophysical workflows in oil and gas exploration

Nick Brown, Anna Roubickova, Ioanna Lampaki et al.

The oil and gas industry is awash with sub-surface data, which is used to characterize the rock and fluid properties beneath the seabed. This in turn drives commercial decision making and exploration, but the industry currently relies upon highly manual workflows when processing data. A key question is whether this can be improved using machine learning to complement the activities of petrophysicists searching for hydrocarbons. In this paper we present work done, in collaboration with Rock Solid Images (RSI), using supervised machine learning on a Cray XC30 to train models that streamline the manual data interpretation process. With a general aim of decreasing the petrophysical interpretation time down from over 7 days to 7 minutes, in this paper we describe the use of mathematical models that have been trained using raw well log data, for completing each of the four stages of a petrophysical interpretation workflow, along with initial data cleaning. We explore how the predictions from these models compare against the interpretations of human petrophysicists, along with numerous options and techniques that were used to optimise the prediction of our models. The power provided by modern supercomputers such as Cray machines is crucial here, but some popular machine learning framework are unable to take full advantage of modern HPC machines. As such we will also explore the suitability of the machine learning tools we have used, and describe steps we took to work round their limitations. The result of this work is the ability, for the first time, to use machine learning for the entire petrophysical workflow. Whilst there are numerous challenges, limitations and caveats, we demonstrate that machine learning has an important role to play in the processing of sub-surface data.

SESep 27, 2020
A highly scalable Met Office NERC Cloud model

Nick Brown, Michèle Weiland, Adrian Hill et al.

Large Eddy Simulation is a critical modelling tool for scientists investigating atmospheric flows, turbulence and cloud microphysics. Within the UK, the principal LES model used by the atmospheric research community is the Met Office Large Eddy Model (LEM). The LEM was originally developed in the late 1980s using computational techniques and assumptions of the time, which means that the it does not scale beyond 512 cores. In this paper we present the Met Office NERC Cloud model, MONC, which is a re-write of the existing LEM. We discuss the software engineering and architectural decisions made in order to develop a flexible, extensible model which the community can easily customise for their own needs. The scalability of MONC is evaluated, along with numerous additional customisations made to further improve performance at large core counts. The result of this work is a model which delivers to the community significant new scientific modelling capability that takes advantage of the current and future generation HPC machines.