NAJan 29, 2018
Simultaneous diagonalisation of the covariance and complementary covariance matrices in quaternion widely linear signal processingMin Xiang, Shirin Enshaeifar, Alexander E. Stott et al.
Recent developments in quaternion-valued widely linear processing have established that the exploitation of complete second-order statistics requires consideration of both the standard covariance and the three complementary covariance matrices. Although such matrices have a tremendous amount of structure and their decomposition is a powerful tool in a variety of applications, the non-commutative nature of the quaternion product has been prohibitive to the development of quaternion uncorrelating transforms. To this end, we introduce novel techniques for a simultaneous decomposition of the covariance and complementary covariance matrices in the quaternion domain, whereby the quaternion version of the Takagi factorisation is explored to diagonalise symmetric quaternion-valued matrices. This gives new insights into the quaternion uncorrelating transform (QUT) and forms a basis for the proposed quaternion approximate uncorrelating transform (QAUT) which simultaneously diagonalises all four covariance matrices associated with improper quaternion signals. The effectiveness of the proposed uncorrelating transforms is validated by simulations on both synthetic and real-world quaternion-valued signals.
SYOct 2, 2014
Distributed Widely Linear Frequency Estimation in Unbalanced Three Phase Power SystemsSithan Kanna, Dahir H. Dini, Yili Xia et al.
A novel method for distributed estimation of the frequency of power systems is introduced based on the cooperation between multiple measurement nodes. The proposed distributed widely linear complex Kalman filter (D-ACKF) and the distributed widely linear extended complex Kalman filter (D-AECKF) employ a widely linear state space and augmented complex statistics to deal with unbalanced system conditions and the generality complex signals, both second order circular (proper) and second order noncircular (improper). It is shown that the current, strictly linear, estimators are inadequate for unbalanced systems, a typical case in smart grids, as they do not account for either the noncircularity of Clarke's αβ-voltage in unbalanced conditions or the correlated nature of nodal disturbances. We illuminate the relationship between the degree of circularity of Clarke's voltage and system imbalance, and prove that the proposed widely linear estimators are optimal for such conditions, while also accounting for the correlated and noncircular nature of real-world nodal disturbances. {Synthetic and real world} case studies over a range of power system conditions illustrate the theoretical and practical advantages of the proposed methodology.
MLNov 28, 2023
The HR-Calculus: Enabling Information Processing with Quaternion AlgebraDanilo P. Mandic, Sayed Pouria Talebi, Clive Cheong Took et al.
From their inception, quaternions and their division algebra have proven to be advantageous in modelling rotation/orientation in three-dimensional spaces and have seen use from the initial formulation of electromagnetic filed theory through to forming the basis of quantum filed theory. Despite their impressive versatility in modelling real-world phenomena, adaptive information processing techniques specifically designed for quaternion-valued signals have only recently come to the attention of the machine learning, signal processing, and control communities. The most important development in this direction is introduction of the HR-calculus, which provides the required mathematical foundation for deriving adaptive information processing techniques directly in the quaternion domain. In this article, the foundations of the HR-calculus are revised and the required tools for deriving adaptive learning techniques suitable for dealing with quaternion-valued signals, such as the gradient operator, chain and product derivative rules, and Taylor series expansion are presented. This serves to establish the most important applications of adaptive information processing in the quaternion domain for both single-node and multi-node formulations. The article is supported by Supplementary Material, which will be referred to as SM.
LGMar 23, 2023
Graph Tensor Networks: An Intuitive Framework for Designing Large-Scale Neural Learning Systems on Multiple DomainsYao Lei Xu, Kriton Konstantinidis, Danilo P. Mandic
Despite the omnipresence of tensors and tensor operations in modern deep learning, the use of tensor mathematics to formally design and describe neural networks is still under-explored within the deep learning community. To this end, we introduce the Graph Tensor Network (GTN) framework, an intuitive yet rigorous graphical framework for systematically designing and implementing large-scale neural learning systems on both regular and irregular domains. The proposed framework is shown to be general enough to include many popular architectures as special cases, and flexible enough to handle data on any and many data domains. The power and flexibility of the proposed framework is demonstrated through real-data experiments, resulting in improved performance at a drastically lower complexity costs, by virtue of tensor algebra.
LGJul 30, 2024
Interpretable Pre-Trained Transformers for Heart Time-Series DataHarry J. Davies, James Monsen, Danilo P. Mandic
Decoder-only transformers are the backbone of the popular generative pre-trained transformer (GPT) series of large language models. In this work, we employ this framework to the analysis of clinical heart time-series data, to create two pre-trained general purpose cardiac models, termed PPG-PT and ECG-PT. We place a special emphasis on making both such pre-trained models fully interpretable. This is achieved firstly through aggregate attention maps which show that, in order to make predictions, the model focuses on similar points in previous cardiac cycles and gradually broadens its attention in deeper layers. Next, we show that tokens with the same value, which occur at different distinct points in the electrocardiography (ECG) and photoplethysmography (PPG) cycle, form separate clusters in high dimensional space. The clusters form according to phase, as the tokens propagate through the transformer blocks. Finally, we highlight that individual attention heads respond to specific physiologically relevent features, such as the dicrotic notch in PPG and the P-wave in ECG. It is also demonstrated that these pre-trained models are straightforward to fine-tune for tasks such as classification of atrial fibrillation (AF), and beat detection in photoplethysmography. For the example of AF, the fine-tuning took 11 minutes of computer time, and achieved the respective leave-one-subject-out AUCs of 0.99 and 0.93 for ECG and PPG within the MIMIC Perform AF dataset. In addition, the fine-tuned beat detector achieved a state-of-the-art F1 score of 98%, as well as uniquely providing a beat confidence level which acts as a signal quality estimator. Importantly, the fine-tuned models for AF screening are also fully explainable, with attention shifting to regions in the context that are strongly indicative of atrial fibrillation.
IVDec 22, 2022
Rapid Extraction of Respiratory Waveforms from Photoplethysmography: A Deep Encoder ApproachHarry J. Davies, Danilo P. Mandic
Much of the information of breathing is contained within the photoplethysmography (PPG) signal, through changes in venous blood flow, heart rate and stroke volume. We aim to leverage this fact, by employing a novel deep learning framework which is a based on a repurposed convolutional autoencoder. Our model aims to encode all of the relevant respiratory information contained within photoplethysmography waveform, and decode it into a waveform that is similar to a gold standard respiratory reference. The model is employed on two photoplethysmography data sets, namely Capnobase and BIDMC. We show that the model is capable of producing respiratory waveforms that approach the gold standard, while in turn producing state of the art respiratory rate estimates. We also show that when it comes to capturing more advanced respiratory waveform characteristics such as duty cycle, our model is for the most part unsuccessful. A suggested reason for this, in light of a previous study on in-ear PPG, is that the respiratory variations in finger-PPG are far weaker compared with other recording locations. Importantly, our model can perform these waveform estimates in a fraction of a millisecond, giving it the capacity to produce over 6 hours of respiratory waveforms in a single second. Moreover, we attempt to interpret the behaviour of the kernel weights within the model, showing that in part our model intuitively selects different breathing frequencies. The model proposed in this work could help to improve the usefulness of consumer PPG-based wearables for medical applications, where detailed respiratory information is required.
CVApr 30, 2024Code
Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal PerspectiveWanqi Zhou, Shuanghao Bai, Danilo P. Mandic et al.
Pretrained vision-language models (VLMs) like CLIP exhibit exceptional generalization across diverse downstream tasks. While recent studies reveal their vulnerability to adversarial attacks, research to date has primarily focused on enhancing the robustness of image encoders against image-based attacks, with defenses against text-based and multimodal attacks remaining largely unexplored. To this end, this work presents the first comprehensive study on improving the adversarial robustness of VLMs against attacks targeting image, text, and multimodal inputs. This is achieved by proposing multimodal contrastive adversarial training (MMCoA). Such an approach strengthens the robustness of both image and text encoders by aligning the clean text embeddings with adversarial image embeddings, and adversarial text embeddings with clean image embeddings. The robustness of the proposed MMCoA is examined against existing defense methods over image, text, and multimodal attacks on the CLIP model. Extensive experiments on 15 datasets across two tasks reveal the characteristics of different adversarial defense methods under distinct distribution shifts and dataset complexities across the three attack types. This paves the way for a unified framework of adversarial robustness against different modality attacks, opening up new possibilities for securing VLMs against multimodal attacks. The code is available at https://github.com/ElleZWQ/MMCoA.git.
SPAug 27, 2024
In-ear ECG Signal Enhancement with Denoising Convolutional AutoencodersEdoardo Occhipinti, Marek Zylinski, Harry J. Davies et al.
The cardiac dipole has been shown to propagate to the ears, now a common site for consumer wearable electronics, enabling the recording of electrocardiogram (ECG) signals. However, in-ear ECG recordings often suffer from significant noise due to their small amplitude and the presence of other physiological signals, such as electroencephalogram (EEG), which complicates the extraction of cardiovascular features. This study addresses this issue by developing a denoising convolutional autoencoder (DCAE) to enhance ECG information from in-ear recordings, producing cleaner ECG outputs. The model is evaluated using a dataset of in-ear ECGs and corresponding clean Lead I ECGs from 45 healthy participants. The results demonstrate a substantial improvement in signal-to-noise ratio (SNR), with a median increase of 5.9 dB. Additionally, the model significantly improved heart rate estimation accuracy, reducing the mean absolute error by almost 70% and increasing R-peak detection precision to a median value of 90%. We also trained and validated the model using a synthetic dataset, generated from real ECG signals, including abnormal cardiac morphologies, corrupted by pink noise. The results obtained show effective removal of noise sources with clinically plausible waveform reconstruction ability.
CLJul 2, 2023
TensorGPT: Efficient Compression of Large Language Models based on Tensor-Train DecompositionMingxue Xu, Yao Lei Xu, Danilo P. Mandic
High-dimensional token embeddings underpin Large Language Models (LLMs), as they can capture subtle semantic information and significantly enhance the modelling of complex language patterns. However, this high dimensionality also introduces considerable model parameters and prohibitively high model storage and memory requirements, which is particularly unaffordable for low-end devices. Targeting no extra training data and insufficient computation cases, we propose a training-free model compression approach based on the Tensor-Train Decomposition (TTD), whereby each pre-trained token embedding is converted into a lower-dimensional Matrix Product State (MPS). We then comprehensively investigate the low-rank structures extracted by this approach, in terms of the compression ratio, the language task performance, and latency on a typical low-end device (i.e. Raspberry Pi). Taking GPT family models (i.e. GPT-2 and CerebrasGPT) as case studies, our approach theoretically results in $46.89\%$ fewer parameters of the entire model, with a compression ratio $39.38\times$ - $65.64\times$ for the embedding layers. With different hyperparameter choices, the model compressed with our approach can achieve a comparable language task performance to the original model with around $2.0\times$ embedding layer compression. This empirically proves the existence of low-rank structure in GPT family models, and demonstrates that about half of the parameters in the embedding layers are redundant.
CPOct 26, 2022
Graph-Regularized Tensor Regression: A Domain-Aware Framework for Interpretable Multi-Way Financial ModellingYao Lei Xu, Kriton Konstantinidis, Danilo P. Mandic
Analytics of financial data is inherently a Big Data paradigm, as such data are collected over many assets, asset classes, countries, and time periods. This represents a challenge for modern machine learning models, as the number of model parameters needed to process such data grows exponentially with the data dimensions; an effect known as the Curse-of-Dimensionality. Recently, Tensor Decomposition (TD) techniques have shown promising results in reducing the computational costs associated with large-dimensional financial models while achieving comparable performance. However, tensor models are often unable to incorporate the underlying economic domain knowledge. To this end, we develop a novel Graph-Regularized Tensor Regression (GRTR) framework, whereby knowledge about cross-asset relations is incorporated into the model in the form of a graph Laplacian matrix. This is then used as a regularization tool to promote an economically meaningful structure within the model parameters. By virtue of tensor algebra, the proposed framework is shown to be fully interpretable, both coefficient-wise and dimension-wise. The GRTR model is validated in a multi-way financial forecasting setting and compared against competing models, and is shown to achieve improved performance at reduced computational costs. Detailed visualizations are provided to help the reader gain an intuitive understanding of the employed tensor operations.
LGJan 25, 2025Code
Kernel-Based Anomaly Detection Using Generalized Hyperbolic ProcessesPauline Bourigault, Danilo P. Mandic
We present a novel approach to anomaly detection by integrating Generalized Hyperbolic (GH) processes into kernel-based methods. The GH distribution, known for its flexibility in modeling skewness, heavy tails, and kurtosis, helps to capture complex patterns in data that deviate from Gaussian assumptions. We propose a GH-based kernel function and utilize it within Kernel Density Estimation (KDE) and One-Class Support Vector Machines (OCSVM) to develop anomaly detection frameworks. Theoretical results confirmed the positive semi-definiteness and consistency of the GH-based kernel, ensuring its suitability for machine learning applications. Empirical evaluation on synthetic and real-world datasets showed that our method improves detection performance in scenarios involving heavy-tailed and asymmetric or imbalanced distributions. https://github.com/paulinebourigault/GHKernelAnomalyDetect
LGJan 24, 2022Code
Pearl: Parallel Evolutionary and Reinforcement Learning LibraryRohan Tangri, Danilo P. Mandic, Anthony G. Constantinides
Reinforcement learning is increasingly finding success across domains where the problem can be represented as a Markov decision process. Evolutionary computation algorithms have also proven successful in this domain, exhibiting similar performance to the generally more complex reinforcement learning. Whilst there exist many open-source reinforcement learning and evolutionary computation libraries, no publicly available library combines the two approaches for enhanced comparison, cooperation, or visualization. To this end, we have created Pearl (https://github.com/LondonNode/Pearl), an open source Python library designed to allow researchers to rapidly and conveniently perform optimized reinforcement learning, evolutionary computation and combinations of the two. The key features within Pearl include: modular and expandable components, opinionated module settings, Tensorboard integration, custom callbacks and comprehensive visualizations.
LGMay 11, 2024
Demystifying the Hypercomplex: Inductive Biases in Hypercomplex Deep LearningDanilo Comminiello, Eleonora Grassucci, Danilo P. Mandic et al.
Hypercomplex algebras have recently been gaining prominence in the field of deep learning owing to the advantages of their division algebras over real vector spaces and their superior results when dealing with multidimensional signals in real-world 3D and 4D paradigms. This paper provides a foundational framework that serves as a roadmap for understanding why hypercomplex deep learning methods are so successful and how their potential can be exploited. Such a theoretical framework is described in terms of inductive bias, i.e., a collection of assumptions, properties, and constraints that are built into training algorithms to guide their learning process toward more efficient and accurate solutions. We show that it is possible to derive specific inductive biases in the hypercomplex domains, which extend complex numbers to encompass diverse numbers and data structures. These biases prove effective in managing the distinctive properties of these domains, as well as the complex structures of multidimensional and multimodal signals. This novel perspective for hypercomplex deep learning promises to both demystify this class of methods and clarify their potential, under a unifying framework, and in this way promotes hypercomplex models as viable alternatives to traditional real-valued deep learning for multidimensional signal processing.
LGFeb 22, 2024
Quaternion recurrent neural network with real-time recurrent learning and maximum correntropy criterionPauline Bourigault, Dongpo Xu, Danilo P. Mandic
We develop a robust quaternion recurrent neural network (QRNN) for real-time processing of 3D and 4D data with outliers. This is achieved by combining the real-time recurrent learning (RTRL) algorithm and the maximum correntropy criterion (MCC) as a loss function. While both the mean square error and maximum correntropy criterion are viable cost functions, it is shown that the non-quadratic maximum correntropy loss function is less sensitive to outliers, making it suitable for applications with multidimensional noisy or uncertain data. Both algorithms are derived based on the novel generalised HR (GHR) calculus, which allows for the differentiation of real functions of quaternion variables and offers the product and chain rules, thus enabling elegant and compact derivations. Simulation results in the context of motion prediction of chest internal markers for lung cancer radiotherapy, which includes regular and irregular breathing sequences, support the analysis.
CROct 21, 2025
Short Ticketing Detection Framework Analysis ReportYuyang Miao, Huijun Xing, Danilo P. Mandic et al.
This report presents a comprehensive analysis of an unsupervised multi-expert machine learning framework for detecting short ticketing fraud in railway systems. The study introduces an A/B/C/D station classification system that successfully identifies suspicious patterns across 30 high-risk stations. The framework employs four complementary algorithms: Isolation Forest, Local Outlier Factor, One-Class SVM, and Mahalanobis Distance. Key findings include the identification of five distinct short ticketing patterns and potential for short ticketing recovery in transportation systems.
CLJun 16, 2025
TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end DevicesMingxue Xu, Yao Lei Xu, Danilo P. Mandic
Small Language Models (SLMs, or on-device LMs) have significantly fewer parameters than Large Language Models (LLMs). They are typically deployed on low-end devices, like mobile phones and single-board computers. Unlike LLMs, which rely on increasing model size for better generalisation, SLMs designed for edge applications are expected to have adaptivity to the deployment environments and energy efficiency given the device battery life constraints, which are not addressed in datacenter-deployed LLMs. This paper addresses these two requirements by proposing a training-free token embedding compression approach using Tensor-Train Decomposition (TTD). Each pre-trained token embedding vector is converted into a lower-dimensional Matrix Product State (MPS). We comprehensively evaluate the extracted low-rank structures across compression ratio, language task performance, latency, and energy consumption on a typical low-end device, i.e. Raspberry Pi. Taking the sub-billion parameter versions of GPT-2/Cerebres-GPT and OPT models as examples, our approach achieves a comparable language task performance to the original model with around $2.0\times$ embedding layer compression, while the energy consumption of a single query drops by half.
LGMay 28, 2025
Versatile Cardiovascular Signal Generation with a Unified Diffusion TransformerZehua Chen, Yuyang Miao, Liyuan Wang et al.
Cardiovascular signals such as photoplethysmography (PPG), electrocardiography (ECG), and blood pressure (BP) are inherently correlated and complementary, together reflecting the health of cardiovascular system. However, their joint utilization in real-time monitoring is severely limited by diverse acquisition challenges from noisy wearable recordings to burdened invasive procedures. Here we propose UniCardio, a multi-modal diffusion transformer that reconstructs low-quality signals and synthesizes unrecorded signals in a unified generative framework. Its key innovations include a specialized model architecture to manage the signal modalities involved in generation tasks and a continual learning paradigm to incorporate varying modality combinations. By exploiting the complementary nature of cardiovascular signals, UniCardio clearly outperforms recent task-specific baselines in signal denoising, imputation, and translation. The generated signals match the performance of ground-truth signals in detecting abnormal health conditions and estimating vital signs, even in unseen domains, while ensuring interpretability for human experts. These advantages position UniCardio as a promising avenue for advancing AI-assisted healthcare.
QUANT-PHApr 17, 2025
A Quantum of Learning: Using Quaternion Algebra to Model Learning on Quantum DevicesSayed Pouria Talebi, Clive Cheong Took, Danilo P. Mandic
This article considers the problem of designing adaption and optimisation techniques for training quantum learning machines. To this end, the division algebra of quaternions is used to derive an effective model for representing computation and measurement operations on qubits. In turn, the derived model, serves as the foundation for formulating an adaptive learning problem on principal quantum learning units, thereby establishing quantum information processing units akin to that of neurons in classical approaches. Then, leveraging the modern HR-calculus, a comprehensive training framework for learning on quantum machines is developed. The quaternion-valued model accommodates mathematical tractability and establishment of performance criteria, such as convergence conditions.
SPApr 2, 2025
Augmentation of EEG and ECG Time Series for Deep Learning Applications: Integrating Changepoint Detection into the iAAFT SurrogatesNina Moutonnet, Gregory Scott, Danilo P. Mandic
The performance of deep learning methods critically depends on the quality and quantity of the available training data. This is especially the case for physiological time series, which are both noisy and scarce, which calls for data augmentation to artificially increase the size of datasets. Another issue is that the time-evolving statistical properties of nonstationary signals prevent the use of standard data augmentation techniques. To this end, we introduce a novel method for augmenting nonstationary time series. This is achieved by combining offline changepoint detection with the iterative amplitude-adjusted Fourier transform (iAAFT), which ensures that the time-frequency properties of the original signal are preserved during augmentation. The proposed method is validated through comparisons of the performance of i) a deep learning seizure detection algorithm on both the original and augmented versions of the CHB-MIT and Siena scalp electroencephalography (EEG) databases, and ii) a deep learning atrial fibrillation (AF) detection algorithm on the original and augmented versions of the Computing in Cardiology Challenge 2017 dataset. By virtue of the proposed method, for the CHB-MIT and Siena datasets respectively, accuracy rose by 4.4% and 1.9%, precision by 10% and 5.5%, recall by 3.6% and 0.9%, and F1 by 4.2% and 1.4%. For the AF classification task, accuracy rose by 0.3%, precision by 2.1%, recall by 0.8%, and F1 by 2.1%.
LGMar 6, 2025
How can representation dimension dominate structurally pruned LLMs?Mingxue Xu, Lisa Alazraki, Danilo P. Mandic
Pruning assumes a subnetwork exists in the original deep neural network, which can achieve comparative model performance with less computation than the original. However, it is unclear how the model performance varies with the different subnetwork extractions. In this paper, we choose the representation dimension (or embedding dimension, model dimension, the dimension of the residual stream in the relevant literature) as the entry point to this issue. We investigate the linear transformations in the LLM transformer blocks and consider a specific structured pruning approach, SliceGPT, to extract the subnetworks of different representation dimensions. We mechanistically analyse the activation flow during the model forward passes, and find the representation dimension dominates the linear transformations, model predictions, and, finally, the model performance. Explicit analytical relations are given to calculate the pruned model performance (perplexity and accuracy) without actual evaluation, and are empirically validated with Llama-3-8B-Instruct and Phi-3-mini-4k-Instruct.
CLDec 13, 2024
Targeted Angular Reversal of Weights (TARS) for Knowledge Removal in Large Language ModelsHarry J. Davies, Giorgos Iacovides, Danilo P. Mandic
The sheer scale of data required to train modern large language models (LLMs) poses significant risks, as models are likely to gain knowledge of sensitive topics such as bio-security, as well the ability to replicate copyrighted works. Methods designed to remove such knowledge must do so from all prompt directions, in a multi-lingual capacity and without degrading general model performance. To this end, we introduce the targeted angular reversal (TARS) method of knowledge removal from LLMs. The TARS method firstly leverages the LLM in combination with a detailed prompt to aggregate information about a selected concept in the internal representation space of the LLM. It then refines this approximate concept vector to trigger the concept token with high probability, by perturbing the approximate concept vector with noise and transforming it into token scores with the language model head. The feedforward weight vectors in the LLM which operate directly on the internal representation space, and have the highest cosine similarity with this targeting vector, are then replaced by a reversed targeting vector, thus limiting the ability of the concept to propagate through the model. The modularity of the TARS method allows for a sequential removal of concepts from Llama 3.1 8B, such as the famous literary detective Sherlock Holmes, and the planet Saturn. It is demonstrated that the probability of triggering target concepts can be reduced to 0.00 with as few as 1 TARS edit, whilst simultaneously removing the knowledge bi-directionally. Moreover, knowledge is shown to be removed across all languages despite only being targeted in English. Importantly, TARS has minimal impact on the general model capabilities, as after removing 5 diverse concepts in a modular fashion, there is minimal KL divergence in the next token probabilities of the LLM on large corpora of Wikipedia text (median of 0.0015).
LGJan 30, 2024
Widely Linear Matched Filter: A Lynchpin towards the Interpretability of Complex-valued CNNsQingchen Wang, Zhe Li, Zdenka Babic et al.
A recent study on the interpretability of real-valued convolutional neural networks (CNNs) {Stankovic_Mandic_2023CNN} has revealed a direct and physically meaningful link with the task of finding features in data through matched filters. However, applying this paradigm to illuminate the interpretability of complex-valued CNNs meets a formidable obstacle: the extension of matched filtering to a general class of noncircular complex-valued data, referred to here as the widely linear matched filter (WLMF), has been only implicit in the literature. To this end, to establish the interpretability of the operation of complex-valued CNNs, we introduce a general WLMF paradigm, provide its solution and undertake analysis of its performance. For rigor, our WLMF solution is derived without imposing any assumption on the probability density of noise. The theoretical advantages of the WLMF over its standard strictly linear counterpart (SLMF) are provided in terms of their output signal-to-noise-ratios (SNRs), with WLMF consistently exhibiting enhanced SNR. Moreover, the lower bound on the SNR gain of WLMF is derived, together with condition to attain this bound. This serves to revisit the convolution-activation-pooling chain in complex-valued CNNs through the lens of matched filtering, which reveals the potential of WLMFs to provide physical interpretability and enhance explainability of general complex-valued CNNs. Simulations demonstrate the agreement between the theoretical and numerical results.
SPMay 23, 2023
Amplitude-Independent Machine Learning for PPG through Visibility Graphs and Transfer LearningYuyang Miao, Harry J. Davies, Danilo P. Mandic
Photoplethysmography (PPG) refers to the measurement of variations in blood volume using light and is a feature of most wearable devices. The PPG signals provide insight into the body's circulatory system and can be employed to extract various bio-features, such as heart rate and vascular ageing. Although several algorithms have been proposed for this purpose, many exhibit limitations, including heavy reliance on human calibration, high signal quality requirements, and a lack of generalisation. In this paper, we introduce a PPG signal processing framework that integrates graph theory and computer vision algorithms, to provide an analysis framework which is amplitude-independent and invariant to affine transformations. It also requires minimal preprocessing, fuses information through RGB channels and exhibits robust generalisation across tasks and datasets. The proposed VGTL-net achieves state-of-the-art performance in the prediction of vascular ageing and demonstrates robust estimation of continuous blood pressure waveforms.
OCMay 9, 2023
Convex Quaternion Optimization for Signal Processing: Theory and ApplicationsShuning Sun, Qiankun Diao, Dongpo Xu et al.
Convex optimization methods have been extensively used in the fields of communications and signal processing. However, the theory of quaternion optimization is currently not as fully developed and systematic as that of complex and real optimization. To this end, we establish an essential theory of convex quaternion optimization for signal processing based on the generalized Hamilton-real (GHR) calculus. This is achieved in a way which conforms with traditional complex and real optimization theory. For rigorous, We present five discriminant theorems for convex quaternion functions, and four discriminant criteria for strongly convex quaternion functions. Furthermore, we provide a fundamental theorem for the optimality of convex quaternion optimization problems, and demonstrate its utility through three applications in quaternion signal processing. These results provide a solid theoretical foundation for convex quaternion optimization and open avenues for further developments in signal processing applications.
LGMay 9, 2023
UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic OptimizationYiming Jiang, Jinlan Liu, Dongpo Xu et al.
Adam-type algorithms have become a preferred choice for optimisation in the deep learning setting, however, despite success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type algorithms (called UAdam). This is equipped with a general form of the second-order moment, which makes it possible to include Adam and its variants as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. This is supported by a rigorous convergence analysis of UAdam in the non-convex stochastic setting, showing that UAdam converges to the neighborhood of stationary points with the rate of $\mathcal{O}(1/T)$. Furthermore, the size of neighborhood decreases as $β$ increases. Importantly, our analysis only requires the first-order momentum factor to be close enough to 1, without any restrictions on the second-order momentum factor. Theoretical results also show that vanilla Adam can converge by selecting appropriate hyperparameters, which provides a theoretical guarantee for the analysis, applications, and further developments of the whole class of Adam-type algorithms.
MED-PHSep 14, 2021
An Apparatus for the Simulation of Breathing Disorders: Physically Meaningful Generation of Surrogate DataHarry J. Davies, Ghena Hammour, Hongjian Xiao et al.
The rapidly increasing prevalence of debilitating breathing disorders, such as chronic obstructive pulmonary disease (COPD), calls for a meaningful integration of artificial intelligence (AI) into healthcare. While this promises improved detection and monitoring of breathing disorders, AI techniques are almost invariably "data hungry" which highlights the importance of generating physically meaningful surrogate data. Indeed, domain aware surrogates would enable both an improved understanding of respiratory waveform changes with different breathing disorders, and enhance the training of machine learning algorithms. To this end, we introduce an apparatus comprising of PVC tubes and 3D printed parts as a simple yet effective method of simulating both obstructive and restrictive respiratory waveforms in healthy subjects. Independent control over both inspiratory and expiratory resistances allows for the simulation of obstructive breathing disorders through the whole spectrum of FEV1/FVC spirometry ratios (used to classify COPD), ranging from healthy values to values seen in severe chronic obstructive pulmonary disease. Moreover, waveform characteristics of breathing disorders, such as a change in inspiratory duty cycle or peak flow are also observed in the waveforms resulting from use of the artificial breathing disorder simulation apparatus. Overall, the proposed apparatus provides us with a simple, effective and physically meaningful way to generate faithful surrogate breathing disorder waveforms, a prerequisite for the use of artificial intelligence in respiratory health.
LGMay 11, 2021
Graph Theory for Metro Traffic ModellingBruno Scalzo Dees, Yao Lei Xu, Anthony G. Constantinides et al.
A unifying graph theoretic framework for the modelling of metro transportation networks is proposed. This is achieved by first introducing a basic graph framework for the modelling of the London underground system from a diffusion law point of view. This forms a basis for the analysis of both station importance and their vulnerability, whereby the concept of graph vertex centrality plays a key role. We next explore k-edge augmentation of a graph topology, and illustrate its usefulness both for improving the network robustness and as a planning tool. Upon establishing the graph theoretic attributes of the underlying graph topology, we proceed to introduce models for processing data on such a metro graph. Commuter movement is shown to obey the Fick's law of diffusion, where the graph Laplacian provides an analytical model for the diffusion process of commuter population dynamics. Finally, we also explore the application of modern deep learning models, such as graph neural networks and hyper-graph neural networks, as general purpose models for the modelling and forecasting of underground data, especially in the context of the morning and evening rush hours. Comprehensive simulations including the passenger in- and out-flows during the morning rush hour in London demonstrates the advantages of the graph models in metro planning and traffic management, a formal mathematical approach with wide economic implications.
LGMay 11, 2021
Tensor-Train Recurrent Neural Networks for Interpretable Multi-Way Financial ForecastingYao Lei Xu, Giuseppe G. Calvi, Danilo P. Mandic
Recurrent Neural Networks (RNNs) represent the de facto standard machine learning tool for sequence modelling, owing to their expressive power and memory. However, when dealing with large dimensional data, the corresponding exponential increase in the number of parameters imposes a computational bottleneck. The necessity to equip RNNs with the ability to deal with the curse of dimensionality, such as through the parameter compression ability inherent to tensors, has led to the development of the Tensor-Train RNN (TT-RNN). Despite achieving promising results in many applications, the full potential of the TT-RNN is yet to be explored in the context of interpretable financial modelling, a notoriously challenging task characterized by multi-modal data with low signal-to-noise ratio. To address this issue, we investigate the potential of TT-RNN in the task of financial forecasting of currencies. We show, through the analysis of TT-factors, that the physical meaning underlying tensor decomposition, enables the TT-RNN model to aid the interpretability of results, thus mitigating the notorious "black-box" issue associated with neural networks. Furthermore, simulation results highlight the regularization power of TT decomposition, demonstrating the superior performance of TT-RNN over its uncompressed RNN counterpart and other tensor forecasting methods.
LGMar 27, 2021
Tensor Networks for Multi-Modal Non-Euclidean DataYao Lei Xu, Kriton Konstantinidis, Danilo P. Mandic
Modern data sources are typically of large scale and multi-modal natures, and acquired on irregular domains, which poses serious challenges to traditional deep learning models. These issues are partially mitigated by either extending existing deep learning algorithms to irregular domains through graphs, or by employing tensor methods to alleviate the computational bottlenecks imposed by the Curse of Dimensionality. To simultaneously resolve both these issues, we introduce a novel Multi-Graph Tensor Network (MGTN) framework, which leverages on the desirable properties of graphs, tensors and neural networks in a physically meaningful and compact manner. This equips MGTNs with the ability to exploit local information in irregular data sources at a drastically reduced parameter complexity, and over a range of learning paradigms such as regression, classification and reinforcement learning. The benefits of the MGTN framework, especially its ability to avoid overfitting through the inherent low-rank regularization properties of tensor networks, are demonstrated through its superior performance against competing models in the individual tensor, graph, and neural network domains.
LGOct 25, 2020
Multi-Graph Tensor NetworksYao Lei Xu, Kriton Konstantinidis, Danilo P. Mandic
The irregular and multi-modal nature of numerous modern data sources poses serious challenges for traditional deep learning algorithms. To this end, recent efforts have generalized existing algorithms to irregular domains through graphs, with the aim to gain additional insights from data through the underlying graph topology. At the same time, tensor-based methods have demonstrated promising results in bypassing the bottlenecks imposed by the Curse of Dimensionality. In this paper, we introduce a novel Multi-Graph Tensor Network (MGTN) framework, which exploits both the ability of graphs to handle irregular data sources and the compression properties of tensor networks in a deep learning setting. The potential of the proposed framework is demonstrated through an MGTN based deep Q agent for Foreign Exchange (FOREX) algorithmic trading. By virtue of the MGTN, a FOREX currency graph is leveraged to impose an economically meaningful structure on this demanding task, resulting in a highly superior performance against three competing models and at a drastically lower complexity.
LGSep 18, 2020
Recurrent Graph Tensor Networks: A Low-Complexity Framework for Modelling High-Dimensional Multi-Way SequenceYao Lei Xu, Danilo P. Mandic
Recurrent Neural Networks (RNNs) are among the most successful machine learning models for sequence modelling, but tend to suffer from an exponential increase in the number of parameters when dealing with large multidimensional data. To this end, we develop a multi-linear graph filter framework for approximating the modelling of hidden states in RNNs, which is embedded in a tensor network architecture to improve modelling power and reduce parameter complexity, resulting in a novel Recurrent Graph Tensor Network (RGTN). The proposed framework is validated through several multi-way sequence modelling tasks and benchmarked against traditional RNNs. By virtue of the domain aware information processing of graph filters and the expressive power of tensor networks, we show that the proposed RGTN is capable of not only out-performing standard RNNs, but also mitigating the Curse of Dimensionality associated with traditional RNNs, demonstrating superior properties in terms of performance and complexity.
LGFeb 26, 2020
Tensor Decompositions in Deep LearningDavide Bacciu, Danilo P. Mandic
The paper surveys the topic of tensor decompositions in modern machine learning applications. It focuses on three active research topics of significant relevance for the community. After a brief review of consolidated works on multi-way data analysis, we consider the use of tensor decompositions in compressing the parameter space of deep learning models. Lastly, we discuss how tensor methods can be leveraged to yield richer adaptive representations of complex data, including structured information. The paper concludes with a discussion on interesting open research challenges.
LGJan 27, 2020
Supervised Learning for Non-Sequential Data: A Canonical Polyadic Decomposition ApproachAlexandros Haliassos, Kriton Konstantinidis, Danilo P. Mandic
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks, characterized by a lack of inherent ordering of features (variables). The brute force approach of learning a parameter for each interaction of every order comes at an exponential computational and memory cost (Curse of Dimensionality). To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor, the order of which is equal to the number of features; for efficiency, it can be further factorized into a compact Tensor Train (TT) format. However, both TT and other Tensor Networks (TNs), such as Tensor Ring and Hierarchical Tucker, are sensitive to the ordering of their indices (and hence to the features). To establish the desired invariance to feature ordering, we propose to represent the weight tensor through the Canonical Polyadic (CP) Decomposition (CPD), and introduce the associated inference and learning algorithms, including suitable regularization and initialization schemes. It is demonstrated that the proposed CP-based predictor significantly outperforms other TN-based predictors on sparse data while exhibiting comparable performance on dense non-sequential tasks. Furthermore, for enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors. In conjunction with feature vector normalization, this is shown to yield dramatic improvements in performance for dense non-sequential tasks, matching models such as fully-connected neural networks.
MLOct 24, 2019
Robust Principal Component Analysis Based On Maximum Correntropy Power IterationsJean P. Chereau, Bruno Scalzo Dees, Danilo P. Mandic
Principal component analysis (PCA) is recognised as a quintessential data analysis technique when it comes to describing linear relationships between the features of a dataset. However, the well-known sensitivity of PCA to non-Gaussian samples and/or outliers often makes it unreliable in practice. To this end, a robust formulation of PCA is derived based on the maximum correntropy criterion (MCC) so as to maximise the expected likelihood of Gaussian distributed reconstruction errors. In this way, the proposed solution reduces to a generalised power iteration, whereby: (i) robust estimates of the principal components are obtained even in the presence of outliers; (ii) the number of principal components need not be specified in advance; and (iii) the entire set of principal components can be obtained, unlike existing approaches. The advantages of the proposed maximum correntropy power iteration (MCPI) are demonstrated through an intuitive numerical example.
LGMar 14, 2019
Compression and Interpretability of Deep Neural Networks via Tucker Tensor Layer: From First Principles to Tensor Valued Back-PropagationGiuseppe G. Calvi, Ahmad Moniri, Mahmoud Mahfouz et al.
This work aims to help resolve the two main stumbling blocks in the application of Deep Neural Networks (DNNs), that is, the exceedingly large number of trainable parameters and their physical interpretability. This is achieved through a tensor valued approach, based on the proposed Tucker Tensor Layer (TTL), as an alternative to the dense weight-matrices of DNNs. This allows us to treat the weight-matrices of general DNNs as a matrix unfolding of a higher order weight-tensor. By virtue of the compression properties of tensor decompositions, this enables us to introduce a novel and efficient framework for exploiting the multi-way nature of the weight-tensor in order to dramatically reduce the number of DNN parameters. We also derive the tensor valued back-propagation algorithm within the TTL framework, by extending the notion of matrix derivatives to tensors. In this way, the physical interpretability of the Tucker decomposition is exploited to gain physical insights into the NN training, through the process of computing gradients with respect to each factor matrix. The proposed framework is validated on both synthetic data, and the benchmark datasets MNIST, Fashion-MNIST, and CIFAR-10. Overall, through the ability to provide the relative importance of each data feature in training, the TTL back-propagation is shown to help mitigate the "black-box" nature inherent to NNs. Experiments also illustrate that the TTL achieves a 66.63-fold compression on MNIST and Fashion-MNIST, while, by simplifying the VGG-16 network, it achieves a 10\% speed up in training time, at a comparable performance.
SPNov 1, 2017
Tensor Valued Common and Individual Feature Extraction: Multi-dimensional PerspectiveIlia Kisil, Giuseppe G. Calvi, Danilo P. Mandic
A novel method for common and individual feature analysis from exceedingly large-scale data is proposed, in order to ensure the tractability of both the computation and storage and thus mitigate the curse of dimensionality, a major bottleneck in modern data science. This is achieved by making use of the inherent redundancy in so-called multi-block data structures, which represent multiple observations of the same phenomenon taken at different times, angles or recording conditions. Upon providing an intrinsic link between the properties of the outer vector product and extracted features in tensor decompositions (TDs), the proposed common and individual information extraction from multi-block data is performed through imposing physical meaning to otherwise unconstrained factorisation approaches. This is shown to dramatically reduce the dimensionality of search spaces for subsequent classification procedures and to yield greatly enhanced accuracy. Simulations on a multi-class classification task of large-scale extraction of individual features from a collection of partially related real-world images demonstrate the advantages of the "blessing of dimensionality" associated with TDs.
CRMay 10, 2017
In-ear EEG biometrics for feasible and readily collectable real-world person authenticationTakashi Nakamura, Valentin Goverdovsky, Danilo P. Mandic
The use of EEG as a biometrics modality has been investigated for about a decade, however its feasibility in real-world applications is not yet conclusively established, mainly due to the issues with collectability and reproducibility. To this end, we propose a readily deployable EEG biometrics system based on a `one-fits-all' viscoelastic generic in-ear EEG sensor (collectability), which does not require skilled assistance or cumbersome preparation. Unlike most existing studies, we consider data recorded over multiple recording days and for multiple subjects (reproducibility) while, for rigour, the training and test segments are not taken from the same recording days. A robust approach is considered based on the resting state with eyes closed paradigm, the use of both parametric (autoregressive model) and non-parametric (spectral) features, and supported by simple and fast cosine distance, linear discriminant analysis and support vector machine classifiers. Both the verification and identification forensics scenarios are considered and the achieved results are on par with the studies based on impractical on-scalp recordings. Comprehensive analysis over a number of subjects, setups, and analysis features demonstrates the feasibility of the proposed ear-EEG biometrics, and its potential in resolving the critical collectability, robustness, and reproducibility issues associated with current EEG biometrics.
MED-PHJan 3, 2017
Automatic sleep monitoring using ear-EEGTakashi Nakamura, Valentin Goverdovsky, Mary J. Morrell et al.
The monitoring of sleep patterns without patient's inconvenience or involvement of a medical specialist is a clinical question of significant importance. To this end, we propose an automatic sleep stage monitoring system based on an affordable, unobtrusive, discreet, and long-term wearable in-ear sensor for recording the Electroencephalogram (ear-EEG). The selected features for sleep pattern classification from a single ear-EEG channel include the spectral edge frequency (SEF) and multi- scale fuzzy entropy (MSFE), a structural complexity feature. In this preliminary study, the manually scored hypnograms from simultaneous scalp-EEG and ear-EEG recordings of four subjects are used as labels for two analysis scenarios: 1) classification of ear-EEG hypnogram labels from ear-EEG recordings and 2) prediction of scalp-EEG hypnogram labels from ear-EEG recordings. We consider both 2-class and 4-class sleep scoring, with the achieved accuracies ranging from 78.5 % to 95.2 % for ear-EEG labels predicted from ear-EEG, and 76.8 % to 91.8 % for scalp-EEG labels predicted from ear-EEG. The corresponding kappa coefficients, which range from 0.64 to 0.83 for Scenario 1 and from 0.65 to 0.80 for Scenario 2, indicate a Substantial to Almost Perfect agreement, thus proving the feasibility of in-ear sensing for sleep monitoring in the community.
MLMar 8, 2016
Frequency estimation in three-phase power systems with harmonic contamination: A multistage quaternion Kalman filtering approachSayed Pouria Talebi, Danilo P. Mandic
Motivated by the need for accurate frequency information, a novel algorithm for estimating the fundamental frequency and its rate of change in three-phase power systems is developed. This is achieved through two stages of Kalman filtering. In the first stage a quaternion extended Kalman filter, which provides a unified framework for joint modeling of voltage measurements from all the phases, is used to estimate the instantaneous phase increment of the three-phase voltages. The phase increment estimates are then used as observations of the extended Kalman filter in the second stage that accounts for the dynamic behavior of the system frequency and simultaneously estimates the fundamental frequency and its rate of change. The framework is then extended to account for the presence of harmonics. Finally, the concept is validated through simulation on both synthetic and real-world data.
NAJun 13, 2014
Quaternion Gradient and HessianDongpo Xu, Danilo P. Mandic
The optimization of real scalar functions of quaternion variables, such as the mean square error or array output power, underpins many practical applications. Solutions often require the calculation of the gradient and Hessian, however, real functions of quaternion variables are essentially non-analytic. To address this issue, we propose new definitions of quaternion gradient and Hessian, based on the novel generalized HR (GHR) calculus, thus making possible efficient derivation of optimization algorithms directly in the quaternion field, rather than transforming the problem to the real domain, as is current practice. In addition, unlike the existing quaternion gradients, the GHR calculus allows for the product and chain rule, and for a one-to-one correspondence of the proposed quaternion gradient and Hessian with their real counterparts. Properties of the quaternion gradient and Hessian relevant to numerical applications are elaborated, and the results illuminate the usefulness of the GHR calculus in greatly simplifying the derivation of the quaternion least mean squares, and in quaternion least square and Newton algorithm. The proposed gradient and Hessian are also shown to enable the same generic forms as the corresponding real- and complex-valued algorithms, further illustrating the advantages in algorithm design and evaluation.
AIJul 5, 2012
Higher-Order Partial Least Squares (HOPLS): A Generalized Multi-Linear Regression MethodQibin Zhao, Cesar F. Caiafa, Danilo P. Mandic et al.
A new generalized multilinear regression model, termed the Higher-Order Partial Least Squares (HOPLS), is introduced with the aim to predict a tensor (multiway array) $\tensor{Y}$ from a tensor $\tensor{X}$ through projecting the data onto the latent space and performing regression on the corresponding latent variables. HOPLS differs substantially from other regression models in that it explains the data by a sum of orthogonal Tucker tensors, while the number of orthogonal loadings serves as a parameter to control model complexity and prevent overfitting. The low dimensional latent space is optimized sequentially via a deflation operation, yielding the best joint subspace approximation for both $\tensor{X}$ and $\tensor{Y}$. Instead of decomposing $\tensor{X}$ and $\tensor{Y}$ individually, higher order singular value decomposition on a newly defined generalized cross-covariance tensor is employed to optimize the orthogonal loadings. A systematic comparison on both synthetic data and real-world decoding of 3D movement trajectories from electrocorticogram (ECoG) signals demonstrate the advantages of HOPLS over the existing methods in terms of better predictive ability, suitability to handle small sample sizes, and robustness to noise.