Amanda S. Barnard

h-index56

8papers

17citations

Novelty47%

AI Score51

Ranked #17,577 of 194,257 authors (top 9%)#4,397 in LG (top 11%)

8 Papers

5.3LGMay 12Code

OverNaN: NaN-Aware Oversampling for Imbalanced Learning with Meaningful Missingness

Amanda S Barnard

Missing values are routinely treated as defects to be eliminated through deletion or imputation prior to machine learning. In many applied domains, however, missingness itself carries information, reflecting experimental constraints, measurement choices, or systematic mechanisms tied to the data-generating process. Eliminating or masking this structure can distort class boundaries, introduce bias, and reduce generalisability; particularly in imbalanced datasets where minority classes are already under-represented. OverNaN is a lightweight, NaN-aware oversampling framework designed to address class imbalance without erasing missingness structure. It extends common synthetic oversampling methods to operate directly on incomplete feature vectors, allowing missing values to be preserved, propagated, or selectively interpolated according to explicitly defined strategies. Rather than repairing missing data, OverNaN treats missingness as part of the feature space over which synthetic samples are generated. This paper situates OverNaN within the broader landscape of imbalanced learning, missing-data handling, and NaN-tolerant algorithms. Using representative examples included with the software, we demonstrate that meaningful missingness can be retained during oversampling without introducing artificial certainty. OverNaN is intended for practitioners working with small, incomplete, and imbalanced datasets in scientific and engineering domains where missingness is unavoidable and often informative.

6.9ETApr 14

LightMat-HP: A Photonic-Electronic System for Accelerating General Matrix Multiplication With Configurable Precision

Hailong Gong, Haibo Zhang, Amanda S. Barnard et al.

Matrix multiplication is a fundamental kernel in large-scale artificial intelligence and scientific computing, but its performance on conventional electronic accelerators is increasingly constrained by memory bandwidth and energy efficiency. Photonic computing offers a promising alternative due to its ultra-high bandwidth, massive parallelism, and low power dissipation. However, most existing photonic systems are limited to low-precision computation because of analog optical modulation constraints and noise accumulation, which restricts their applicability in precision-critical workloads. To address this limitation, we propose LightMat-HP, a hybrid photonic-electronic computing system that enables end-to-end acceleration of general matrix multiplication with configurable computational precision. LightMat-HP adopts block floating-point (BFP) arithmetic to reduce computational complexity while enabling flexible precision-performance tradeoffs. To overcome the precision limitations of photonic devices, we propose a slicing-based photonic multiplication scheme that exploits the high accuracy of low bit-width photonic multiplication in combination with digital accumulation to achieve high-precision mantissa multiplication. A tile-based matrix multiplication dataflow is further designed to support matrices of arbitrary sizes. We experimentally validate LightMat-HP on a photonic computing prototype and evaluate its performance through large-scale simulations. The results demonstrate that LightMat-HP outperforms FPGA, GPU, and a state-of-the-art photonic accelerator across throughput, latency, and energy efficiency, particularly for small- and medium-sized matrix multiplications, owing to its highly parallel photonic architecture, efficient data movement, and slice-based BFP arithmetic.

4.7LGJun 1

RobustModelMaker: Coupling Bootstrap Stability Selection with Leakage-Safe Nested Cross-Validation for Scientific Machine Learning

Amanda S Barnard

Small-to-medium scientific datasets place machine learning pipelines under two compounding pressures. Single-run feature selection produces feature sets that change substantially under small perturbations of the training data, and any procedure that uses the same data for selection, tuning, and evaluation produces optimistically biased performance estimates. The two failure modes are routinely treated as separable, but in the regimes where scientific data live, they interact: an unstable selection inflates the variance of an already-optimistic score, and standard remedies for one rarely address the other. RobustModelMaker is a Python framework that couples bootstrap stability selection with strict nested cross-validation, performs all preprocessing and selection inside each fold, and produces a stability-tested feature subset together with a leakage-safe performance estimate. The framework supports nine algorithms across binary classification, multiclass classification, and regression. Behaviour is verified by a deterministic test suite spanning unit, performance, and reproducibility checks on three real scientific datasets comparing to three alternative selectors (ANOVA F-test, recursive feature elimination with cross-validation, and Boruta) on both predictive score and a Jaccard measure of selection stability. RobustModelMaker is competitive in score with the best alternative selector on each dataset, and occupies a position on the joint score-stability frontier that none of the alternatives match across all three task types. Two example applications, ovarian cancer biomarker discovery from the PLCO Trial and critical-temperature regression on the UCI Superconductivity Data, illustrate how the framework is used in practice and what trade-offs become visible when stability is treated as a first-class deliverable rather than an emergent property.

1.2DCJan 14

A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication

Yufan Xia, Marco De La Pierre, Amanda S. Barnard et al.

The GEneral Matrix Multiplication (GEMM) is one of the essential algorithms in scientific computing. Single-thread GEMM implementations are well-optimised with techniques like blocking and autotuning. However, due to the complexity of modern multi-core shared memory systems, it is challenging to determine the number of threads that minimises the multi-thread GEMM runtime. We present a proof-of-concept approach to building an Architecture and Data-Structure Aware Linear Algebra (ADSALA) software library that uses machine learning to optimise the runtime performance of BLAS routines. More specifically, our method uses a machine learning model on-the-fly to automatically select the optimal number of threads for a given GEMM task based on the collected training data. Test results on two different HPC node architectures, one based on a two-socket Intel Cascade Lake and the other on a two-socket AMD Zen 3, revealed a 25 to 40 per cent speedup compared to traditional GEMM implementations in BLAS when using GEMM of memory usage within 100 MB.

5.8LGSep 28, 2022Code

Variance Tolerance Factors For Interpreting ALL Neural Networks

Sichao Li, Amanda Barnard

Black box models only provide results for deep learning tasks, and lack informative details about how these results were obtained. Knowing how input variables are related to outputs, in addition to why they are related, can be critical to translating predictions into laboratory experiments, or defending a model prediction under scrutiny. In this paper, we propose a general theory that defines a variance tolerance factor (VTF) inspired by influence function, to interpret features in the context of black box neural networks by ranking the importance of features, and construct a novel architecture consisting of a base model and feature model to explore the feature importance in a Rashomon set that contains all well-performing neural networks. Two feature importance ranking methods in the Rashomon set and a feature selection method based on the VTF are created and explored. A thorough evaluation on synthetic and benchmark datasets is provided, and the method is applied to two real world examples predicting the formation of noncrystalline gold nanoparticles and the chemical toxicity 1793 aromatic compounds exposed to a protozoan ciliate for 40 hours.

2.6LGNov 4, 2024

EXAGREE: Mitigating Explanation Disagreement with Stakeholder-Aligned Models

Sichao Li, Tommy Liu, Quanling Deng et al.

Conflicting explanations, arising from different attribution methods or model internals, limit the adoption of machine learning models in safety-critical domains. We turn this disagreement into an advantage and introduce EXplanation AGREEment (EXAGREE), a two-stage framework that selects a Stakeholder-Aligned Explanation Model (SAEM) from a set of similar-performing models. The selection maximizes Stakeholder-Machine Agreement (SMA), a single metric that unifies faithfulness and plausibility. EXAGREE couples a differentiable mask-based attribution network (DMAN) with monotone differentiable sorting, enabling gradient-based search inside the constrained model space. Experiments on six real-world datasets demonstrate simultaneous gains of faithfulness, plausibility, and fairness over baselines, while preserving task accuracy. Extensive ablation studies, significance tests, and case studies confirm the robustness and feasibility of the method in practice.

3.8LGMay 30, 2023Code

Shapley Based Residual Decomposition for Instance Analysis

Tommy Liu, Amanda Barnard

In this paper, we introduce the idea of decomposing the residuals of regression with respect to the data instances instead of features. This allows us to determine the effects of each individual instance on the model and each other, and in doing so makes for a model-agnostic method of identifying instances of interest. In doing so, we can also determine the appropriateness of the model and data in the wider context of a given study. The paper focuses on the possible applications that such a framework brings to the relatively unexplored field of instance analysis in the context of Explainable AI tasks.

8.8LGMay 17, 2023

Exploring the cloud of feature interaction scores in a Rashomon set

Sichao Li, Rong Wang, Quanling Deng et al.

Interactions among features are central to understanding the behavior of machine learning models. Recent research has made significant strides in detecting and quantifying feature interactions in single predictive models. However, we argue that the feature interactions extracted from a single pre-specified model may not be trustworthy since: a well-trained predictive model may not preserve the true feature interactions and there exist multiple well-performing predictive models that differ in feature interaction strengths. Thus, we recommend exploring feature interaction strengths in a model class of approximately equally accurate predictive models. In this work, we introduce the feature interaction score (FIS) in the context of a Rashomon set, representing a collection of models that achieve similar accuracy on a given task. We propose a general and practical algorithm to calculate the FIS in the model class. We demonstrate the properties of the FIS via synthetic data and draw connections to other areas of statistics. Additionally, we introduce a Halo plot for visualizing the feature interaction variance in high-dimensional space and a swarm plot for analyzing FIS in a Rashomon set. Experiments with recidivism prediction and image classification illustrate how feature interactions can vary dramatically in importance for similarly accurate predictive models. Our results suggest that the proposed FIS can provide valuable insights into the nature of feature interactions in machine learning models.