AIAug 2, 2024
Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance DecompositionRóisín Luo, James McDermott, Colm O'Riordan
Perturbation robustness evaluates the vulnerabilities of models, arising from a variety of perturbations, such as data corruptions and adversarial attacks. Understanding the mechanisms of perturbation robustness is critical for global interpretability. We present a model-agnostic, global mechanistic interpretability method to interpret the perturbation robustness of image models. This research is motivated by two key aspects. First, previous global interpretability works, in tandem with robustness benchmarks, e.g. mean corruption error (mCE), are not designed to directly interpret the mechanisms of perturbation robustness within image models. Second, we notice that the spectral signal-to-noise ratios (SNR) of perturbed natural images exponentially decay over the frequency. This power-law-like decay implies that: Low-frequency signals are generally more robust than high-frequency signals -- yet high classification accuracy can not be achieved by low-frequency signals alone. By applying Shapley value theory, our method axiomatically quantifies the predictive powers of robust features and non-robust features within an information theory framework. Our method, dubbed as \textbf{I-ASIDE} (\textbf{I}mage \textbf{A}xiomatic \textbf{S}pectral \textbf{I}mportance \textbf{D}ecomposition \textbf{E}xplanation), provides a unique insight into model robustness mechanisms. We conduct extensive experiments over a variety of vision models pre-trained on ImageNet to show that \textbf{I-ASIDE} can not only \textbf{measure} the perturbation robustness but also \textbf{provide interpretations} of its mechanisms.
SDAug 5, 2024
Text Conditioned Symbolic Drumbeat Generation using Latent Diffusion ModelsPushkar Jajoria, James McDermott
This study introduces a text-conditioned approach to generating drumbeats with Latent Diffusion Models (LDMs). It uses informative conditioning text extracted from training data filenames. By pretraining a text and drumbeat encoder through contrastive learning within a multimodal network, aligned following CLIP, we align the modalities of text and music closely. Additionally, we examine an alternative text encoder based on multihot text encodings. Inspired by musics multi-resolution nature, we propose a novel LSTM variant, MultiResolutionLSTM, designed to operate at various resolutions independently. In common with recent LDMs in the image space, it speeds up the generation process by running diffusion in a latent space provided by a pretrained unconditional autoencoder. We demonstrate the originality and variety of the generated drumbeats by measuring distance (both over binary pianorolls and in the latent space) versus the training dataset and among the generated drumbeats. We also assess the generated drumbeats through a listening test focused on questions of quality, aptness for the prompt text, and novelty. We show that the generated drumbeats are novel and apt to the prompt text, and comparable in quality to those created by human musicians.
NEMar 24, 2017Code
PonyGE2: Grammatical Evolution in PythonMichael Fenton, James McDermott, David Fagan et al.
Grammatical Evolution (GE) is a population-based evolutionary algorithm, where a formal grammar is used in the genotype to phenotype mapping process. PonyGE2 is an open source implementation of GE in Python, developed at UCD's Natural Computing Research and Applications group. It is intended as an advertisement and a starting-point for those new to GE, a reference for students and researchers, a rapid-prototyping medium for our own experiments, and a Python workout. As well as providing the characteristic genotype to phenotype mapping of GE, a search algorithm engine is also provided. A number of sample problems and tutorials on how to use and adapt PonyGE2 have been developed.
CVAug 1, 2024
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit QuantizationRóisín Luo, Alexandru Drimbarean, James McDermott et al.
This paper explores a novel paradigm in low-bit (i.e. 4-bits or lower) quantization, differing from existing state-of-the-art methods, by framing optimal quantization as an architecture search problem within convolutional neural networks (ConvNets). Our framework, dubbed \textbf{CoRa} (Optimal Quantization Residual \textbf{Co}nvolutional Operator Low-\textbf{Ra}nk Adaptation), is motivated by two key aspects. Firstly, quantization residual knowledge, i.e. the lost information between floating-point weights and quantized weights, has long been neglected by the research community. Reclaiming the critical residual knowledge, with an infinitesimal extra parameter cost, can reverse performance degradation without training. Secondly, state-of-the-art quantization frameworks search for optimal quantized weights to address the performance degradation. Yet, the vast search spaces in weight optimization pose a challenge for the efficient optimization in large models. For example, state-of-the-art BRECQ necessitates $2 \times 10^4$ iterations to quantize models. Fundamentally differing from existing methods, \textbf{CoRa} searches for the optimal architectures of low-rank adapters, reclaiming critical quantization residual knowledge, within the search spaces smaller compared to the weight spaces, by many orders of magnitude. The low-rank adapters approximate the quantization residual weights, discarded in previous methods. We evaluate our approach over multiple pre-trained ConvNets on ImageNet. \textbf{CoRa} achieves comparable performance against both state-of-the-art quantization-aware training and post-training quantization baselines, in $4$-bit and $3$-bit quantization, by using less than $250$ iterations on a small calibration set with $1600$ images. Thus, \textbf{CoRa} establishes a new state-of-the-art in terms of the optimization efficiency in low-bit quantization.
AIJul 14, 2025
Parsing Musical Structure to Enable Meaningful VariationsMaziar Kanani, Sean O Leary, James McDermott
This paper presents a novel rule-based approach for generating music by varying existing tunes. We parse each tune to find the Pathway Assembly (PA) [ 1], that is a structure representing all repetitions in the tune. The Sequitur algorithm [2 ] is used for this. The result is a grammar. We then carry out mutation on the grammar, rather than on a tune directly. There are potentially 19 types of mutations such as adding, removing, swapping or reversing parts of the grammar that can be applied to the grammars. The system employs one of the mutations randomly in this step to automatically manipulate the grammar. Following the mutation, we need to expand the grammar which returns a new tune. The output after 1 or more mutations will be a new tune related to the original tune. Our study examines how tunes change gradually over the course of multiple mutations. Edit distances, structural complexity and length of the tunes are used to show how a tune is changed after multiple mutations. In addition, the size of effect of each mutation type is analyzed. As a final point, we review the musical aspect of the output tunes. It should be noted that the study only focused on generating new pitch sequences. The study is based on an Irish traditional tune dataset and a list of integers has been used to represent each tune's pitch values.
MLJun 4, 2025
Higher-Order Singular-Value Derivatives of Rectangular Real MatricesRóisín Luo, James McDermott, Colm O'Riordan
We present a theoretical framework for deriving the general $n$-th order Fréchet derivatives of singular values in real rectangular matrices, by leveraging reduced resolvent operators from Kato's analytic perturbation theory for self-adjoint operators. Deriving closed-form expressions for higher-order derivatives of singular values is notoriously challenging through standard matrix-analysis techniques. To overcome this, we treat a real rectangular matrix as a compact operator on a finite-dimensional Hilbert space, and embed the rectangular matrix into a block self-adjoint operator so that non-symmetric perturbations are captured. Applying Kato's asymptotic eigenvalue expansion to this construction, we obtain a general, closed-form expression for the infinitesimal $n$-th order spectral variations. Specializing to $n=2$ and deploying on a Kronecker-product representation with matrix convention yield the Hessian of a singular value, not found in literature. By bridging abstract operator-theoretic perturbation theory with matrices, our framework equips researchers with a practical toolkit for higher-order spectral sensitivity studies in random matrix applications (e.g., adversarial perturbation in deep learning).
LGNov 27, 2025
Can Synthetic Data Improve Symbolic Regression Extrapolation Performance?Fitria Wulandari Ramlan, Colm O'Riordan, Gabriel Kronberger et al.
Many machine learning models perform well when making predictions within the training data range, but often struggle when required to extrapolate beyond it. Symbolic regression (SR) using genetic programming (GP) can generate flexible models but is prone to unreliable behaviour in extrapolation. This paper investigates whether adding synthetic data can help improve performance in such cases. We apply Kernel Density Estimation (KDE) to identify regions in the input space where the training data is sparse. Synthetic data is then generated in those regions using a knowledge distillation approach: a teacher model generates predictions on new input points, which are then used to train a student model. We evaluate this method across six benchmark datasets, using neural networks (NN), random forests (RF), and GP both as teacher models (to generate synthetic data) and as student models (trained on the augmented data). Results show that GP models can often improve when trained on synthetic data, especially in extrapolation areas. However, the improvement depends on the dataset and teacher model used. The most important improvements are observed when synthetic data from GPe is used to train GPp in extrapolation regions. Changes in interpolation areas show only slight changes. We also observe heterogeneous errors, where model performance varies across different regions of the input space. Overall, this approach offers a practical solution for better extrapolation. Note: An earlier version of this work appeared in the GECCO 2025 Workshop on Symbolic Regression. This arXiv version corrects several parts of the original submission.
CVJun 24, 2025
Sampling Matters in Explanations: Towards Trustworthy Attribution Analysis Building Block in Visual Models through Maximizing Explanation CertaintyRóisín Luo, James McDermott, Colm O'Riordan
Image attribution analysis seeks to highlight the feature representations learned by visual models such that the highlighted feature maps can reflect the pixel-wise importance of inputs. Gradient integration is a building block in the attribution analysis by integrating the gradients from multiple derived samples to highlight the semantic features relevant to inferences. Such a building block often combines with other information from visual models such as activation or attention maps to form ultimate explanations. Yet, our theoretical analysis demonstrates that the extent to the alignment of the sample distribution in gradient integration with respect to natural image distribution gives a lower bound of explanation certainty. Prior works add noise into images as samples and the noise distributions can lead to low explanation certainty. Counter-intuitively, our experiment shows that extra information can saturate neural networks. To this end, building trustworthy attribution analysis needs to settle the sample distribution misalignment problem. Instead of adding extra information into input images, we present a semi-optimal sampling approach by suppressing features from inputs. The sample distribution by suppressing features is approximately identical to the distribution of natural images. Our extensive quantitative evaluation on large scale dataset ImageNet affirms that our approach is effective and able to yield more satisfactory explanations against state-of-the-art baselines throughout all experimental models.
LGJun 23, 2025
Optimization-Induced Dynamics of Lipschitz Continuity in Neural NetworksRóisín Luo, James McDermott, Christian Gagné et al.
Lipschitz continuity characterizes the worst-case sensitivity of neural networks to small input perturbations; yet its dynamics (i.e. temporal evolution) during training remains under-explored. We present a rigorous mathematical framework to model the temporal evolution of Lipschitz continuity during training with stochastic gradient descent (SGD). This framework leverages a system of stochastic differential equations (SDEs) to capture both deterministic and stochastic forces. Our theoretical analysis identifies three principal factors driving the evolution: (i) the projection of gradient flows, induced by the optimization dynamics, onto the operator-norm Jacobian of parameter matrices; (ii) the projection of gradient noise, arising from the randomness in mini-batch sampling, onto the operator-norm Jacobian; and (iii) the projection of the gradient noise onto the operator-norm Hessian of parameter matrices. Furthermore, our theoretical framework sheds light on such as how noisy supervision, parameter initialization, batch size, and mini-batch sampling trajectories, among other factors, shape the evolution of the Lipschitz continuity of neural networks. Our experimental results demonstrate strong agreement between the theoretical implications and the observed behaviors.
NEJun 7, 2019
When and Why Metaheuristics Researchers Can Ignore "No Free Lunch" TheoremsJames McDermott
The No Free Lunch (NFL) theorem for search and optimisation states that averaged across all possible objective functions on a fixed search space, all search algorithms perform equally well. Several refined versions of the theorem find a similar outcome when averaging across smaller sets of functions. This paper argues that NFL results continue to be misunderstood by many researchers, and addresses this issue in several ways. Existing arguments against real-world implications of NFL results are collected and re-stated for accessibility, and new ones are added. Specific misunderstandings extant in the literature are identified, with speculation as to how they may have arisen. This paper presents an argument against a common paraphrase of NFL findings -- that algorithms must be specialised to problem domains in order to do well -- after problematising the usually undefined term "domain". It provides novel concrete counter-examples illustrating cases where NFL theorems do not apply. In conclusion it offers a novel view of the real meaning of NFL, incorporating the anthropic principle and justifying the position that in many common situations researchers can ignore NFL.
NEApr 11, 2017
Improving Fitness Functions in Genetic Programming for Classification on Unbalanced Credit Card DatasetsVan Loi Cao, Nhien-An Le-Khac, Miguel Nicolau et al.
Credit card fraud detection based on machine learning has recently attracted considerable interest from the research community. One of the most important tasks in this area is the ability of classifiers to handle the imbalance in credit card data. In this scenario, classifiers tend to yield poor accuracy on the fraud class (minority class) despite realizing high overall accuracy. This is due to the influence of the majority class on traditional training criteria. In this paper, we aim to apply genetic programming to address this issue by adapting existing fitness functions. We examine two fitness functions from previous studies and develop two new fitness functions to evolve GP classifier with superior accuracy on the minority class and overall. Two UCI credit card datasets are used to evaluate the effectiveness of the proposed fitness functions. The results demonstrate that the proposed fitness functions augment GP classifiers, encouraging fitter solutions on both the minority and the majority classes.
LGMar 28, 2017
Collective Anomaly Detection based on Long Short Term Memory Recurrent Neural NetworkLoic Bontemps, Van Loi Cao, James McDermott et al.
Intrusion detection for computer network systems becomes one of the most critical tasks for network administrators today. It has an important role for organizations, governments and our society due to its valuable resources on computer networks. Traditional misuse detection strategies are unable to detect new and unknown intrusion. Besides, anomaly detection in network security is aim to distinguish between illegal or malicious events and normal behavior of network systems. Anomaly detection can be considered as a classification problem where it builds models of normal network behavior, which it uses to detect new patterns that significantly deviate from the model. Most of the cur- rent research on anomaly detection is based on the learning of normally and anomaly behaviors. They do not take into account the previous, re- cent events to detect the new incoming one. In this paper, we propose a real time collective anomaly detection model based on neural network learning and feature operating. Normally a Long Short Term Memory Recurrent Neural Network (LSTM RNN) is trained only on normal data and it is capable of predicting several time steps ahead of an input. In our approach, a LSTM RNN is trained with normal time series data before performing a live prediction for each time step. Instead of considering each time step separately, the observation of prediction errors from a certain number of time steps is now proposed as a new idea for detecting collective anomalies. The prediction errors from a number of the latest time steps above a threshold will indicate a collective anomaly. The model is built on a time series version of the KDD 1999 dataset. The experiments demonstrate that it is possible to offer reliable and efficient for collective anomaly detection.