ASMay 30, 2022Code
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio SynthesisYichong Leng, Zehua Chen, Junliang Guo et al. · microsoft-research
Binaural audio plays a significant role in constructing immersive augmented and virtual realities. As it is expensive to record binaural audio from the real world, synthesizing them from mono audio has attracted increasing attention. This synthesis process involves not only the basic physical warping of the mono audio, but also room reverberations and head/ear related filtrations, which, however, are difficult to accurately simulate in traditional digital signal processing. In this paper, we formulate the synthesis process from a different perspective by decomposing the binaural audio into a common part that shared by the left and right channels as well as a specific part that differs in each channel. Accordingly, we propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively. Specifically, in the first stage, the common information of the binaural audio is generated with a single-channel diffusion model conditioned on the mono audio, based on which the binaural audio is generated by a two-channel diffusion model in the second stage. Combining this novel perspective of two-stage synthesis with advanced generative models (i.e., the diffusion models),the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples. Experiment results show that on a benchmark dataset, BinauralGrad outperforms the existing baselines by a large margin in terms of both object and subject evaluation metrics (Wave L2: 0.128 vs. 0.157, MOS: 3.80 vs. 3.61). The generated audio samples (https://speechresearch.github.io/binauralgrad) and code (https://github.com/microsoft/NeuralSpeech/tree/master/BinauralGrad) are available online.
LGJan 22, 2023Code
Tensor Networks Meet Neural Networks: A Survey and Future PerspectivesMaolin Wang, Yu Pan, Zenglin Xu et al.
Tensor networks (TNs) and neural networks (NNs) are two fundamental data modeling approaches. TNs were introduced to solve the curse of dimensionality in large-scale tensors by converting an exponential number of dimensions to polynomial complexity. As a result, they have attracted significant attention in the fields of quantum physics and machine learning. Meanwhile, NNs have displayed exceptional performance in various applications, e.g., computer vision, natural language processing, and robotics research. Interestingly, although these two types of networks originate from different observations, they are inherently linked through the typical multilinearity structure underlying both TNs and NNs, thereby motivating a significant number of developments regarding combinations of TNs and NNs. In this paper, we refer to these combinations as tensorial neural networks~(TNNs) and present an introduction to TNNs from both data processing and model architecture perspectives. From the data perspective, we explore the capabilities of TNNs in multi-source fusion, multimodal pooling, data compression, multi-task training, and quantum data processing. From the model perspective, we examine TNNs' integration with various architectures, including Convolutional Neural Networks, Recurrent Neural Networks, Graph Neural Networks, Transformers, Large Language Models, and Quantum Neural Networks. Furthermore, this survey also explores methods for improving TNNs, examines flexible toolboxes for implementing TNNs, and documents TNN development while highlighting potential future directions. To the best of our knowledge, this is the first comprehensive survey that bridges the connections among NNs and TNs. We provide a curated list of TNNs at https://github.com/tnbar/awesome-tensorial-neural-networks.
SDJan 29, 2023
AudioLDM: Text-to-Audio Generation with Latent Diffusion ModelsHaohe Liu, Zehua Chen, Yi Yuan et al.
Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general audio based on text descriptions. However, previous studies in TTA have limited generation quality with high computational costs. In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining (CLAP) latents. The pretrained CLAP models enable us to train LDMs with audio embedding while providing text embedding as a condition during sampling. By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency. Trained on AudioCaps with a single GPU, AudioLDM achieves state-of-the-art TTA performance measured by both objective and subjective metrics (e.g., frechet distance). Moreover, AudioLDM is the first TTA system that enables various text-guided audio manipulations (e.g., style transfer) in a zero-shot fashion. Our implementation and demos are available at https://audioldm.github.io.
NASep 29, 2016
Tensor Networks for Latent Variable Analysis. Part I: Algorithms for Tensor Train DecompositionAnh-Huy Phan, Andrzej Cichocki, Andre Uschmajew et al.
Decompositions of tensors into factor matrices, which interact through a core tensor, have found numerous applications in signal processing and machine learning. A more general tensor model which represents data as an ordered network of sub-tensors of order-2 or order-3 has, so far, not been widely considered in these fields, although this so-called tensor network decomposition has been long studied in quantum physics and scientific computing. In this study, we present novel algorithms and applications of tensor network decompositions, with a particular focus on the tensor train decomposition and its variants. The novel algorithms developed for the tensor train decomposition update, in an alternating way, one or several core tensors at each iteration, and exhibit enhanced mathematical tractability and scalability to exceedingly large-scale data tensors. The proposed algorithms are tested in classic paradigms of blind source separation from a single mixture, denoising, and feature extraction, and achieve superior performance over the widely used truncated algorithms for tensor train decomposition.
LGJul 18, 2022
Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural NetworksChuang Liu, Xueqi Ma, Yibing Zhan et al.
Graph Neural Networks (GNNs) tend to suffer from high computation costs due to the exponentially increasing scale of graph data and the number of model parameters, which restricts their utility in practical applications. To this end, some recent works focus on sparsifying GNNs with the lottery ticket hypothesis (LTH) to reduce inference costs while maintaining performance levels. However, the LTH-based methods suffer from two major drawbacks: 1) they require exhaustive and iterative training of dense models, resulting in an extremely large training computation cost, and 2) they only trim graph structures and model parameters but ignore the node feature dimension, where significant redundancy exists. To overcome the above limitations, we propose a comprehensive graph gradual pruning framework termed CGP. This is achieved by designing a during-training graph pruning paradigm to dynamically prune GNNs within one training process. Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs. Furthermore, we design a co-sparsifying strategy to comprehensively trim all three core elements of GNNs: graph structures, node features, and model parameters. Meanwhile, aiming at refining the pruning operation, we introduce a regrowth process into our CGP framework, in order to re-establish the pruned but important connections. The proposed CGP is evaluated by using a node classification task across 6 GNN architectures, including shallow models (GCN and GAT), shallow-but-deep-propagation models (SGC and APPNP), and deep models (GCNII and ResGCN), on a total of 14 real-world graph datasets, including large-scale graph datasets from the challenging Open Graph Benchmark. Experiments reveal that our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
ASDec 30, 2022
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to SpeechZehua Chen, Yihan Wu, Yichong Leng et al.
Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples. However, their iterative refinement process in high-dimensional data space results in slow inference speed, which restricts their application in real-time systems. Previous works have explored speeding up by minimizing the number of inference steps but at the cost of sample quality. In this work, to improve the inference speed for DDPM-based TTS model while achieving high sample quality, we propose ResGrad, a lightweight diffusion model which learns to refine the output spectrogram of an existing TTS model (e.g., FastSpeech 2) by predicting the residual between the model output and the corresponding ground-truth speech. ResGrad has several advantages: 1) Compare with other acceleration methods for DDPM which need to synthesize speech from scratch, ResGrad reduces the complexity of task by changing the generation target from ground-truth mel-spectrogram to the residual, resulting into a more lightweight model and thus a smaller real-time factor. 2) ResGrad is employed in the inference process of the existing TTS model in a plug-and-play way, without re-training this model. We verify ResGrad on the single-speaker dataset LJSpeech and two more challenging datasets with multiple speakers (LibriTTS) and high sampling rate (VCTK). Experimental results show that in comparison with other speed-up methods of DDPMs: 1) ResGrad achieves better sample quality with the same inference speed measured by real-time factor; 2) with similar speech quality, ResGrad synthesizes speech faster than baseline methods by more than 10 times. Audio samples are available at https://resgrad1.github.io/.
CVOct 16, 2022
Demystifying CNNs for Images by Matched FiltersShengxi Li, Xinyi Zhao, Ljubisa Stankovic et al.
The success of convolution neural networks (CNN) has been revolutionising the way we approach and use intelligent machines in the Big Data era. Despite success, CNNs have been consistently put under scrutiny owing to their \textit{black-box} nature, an \textit{ad hoc} manner of their construction, together with the lack of theoretical support and physical meanings of their operation. This has been prohibitive to both the quantitative and qualitative understanding of CNNs, and their application in more sensitive areas such as AI for health. We set out to address these issues, and in this way demystify the operation of CNNs, by employing the perspective of matched filtering. We first illuminate that the convolution operation, the very core of CNNs, represents a matched filter which aims to identify the presence of features in input data. This then serves as a vehicle to interpret the convolution-activation-pooling chain in CNNs under the theoretical umbrella of matched filtering, a common operation in signal processing. We further provide extensive examples and experiments to illustrate this connection, whereby the learning in CNNs is shown to also perform matched filtering, which further sheds light onto physical meaning of learnt parameters and layers. It is our hope that this material will provide new insights into the understanding, constructing and analysing of CNNs, as well as paving the way for developing new methods and architectures of CNNs.
85.5CEMay 16Code
The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment EvidenceYuxuan Ye, Jun Han, Ao Hu et al.
End-to-end LLM trading agents have moved quickly from research curiosity to a small ecosystem of named systems, including FinCon, FinMem, TradingAgents, FinAgent, QuantAgent, and FLAG-Trader. Several of these report headline Sharpe ratios that would be material if read at face value on a deployment desk, and associated benchmarks such as FinBen report trading-task Sharpe statistics in the same range. The gap between architecture research and deployment claim has been crossed too freely on both sides of the academia--industry divide. We take a position on that gap: reported alpha from end-to-end LLM trading agents should not be treated as deployment evidence. Before such returns can support claims of deployable trading capability, they must survive structural validity tests for temporal integrity, real-world frictions, counterfactual robustness, predictive calibration, numerical execution, and multi-agent disaggregation. Current public evidence cannot yet distinguish robust predictive ability from temporal contamination, unmodeled frictions, short-window Sharpe uncertainty, narrative fitting, and parametric priors. The problem is not only evaluative but structural. Language confidence is not tradable probability, narrative reasoning is not numerical execution, and model priors may become undisclosed implicit factor exposures. We contribute a minimum reporting protocol suite, P1--P6, with tiered applicability by claim strength, and a conservative modular alternative that uses LLMs as auditable information interfaces upstream of independent calibration, risk, and execution modules. Code and reproduction harness: \url{https://github.com/hj1650782738/Trading}.
LGJan 12, 2023
Fair and skill-diverse student group formation via constrained k-way graph partitioningAlexander Jenkins, Imad Jaimoukha, Ljubisa Stankovic et al.
Forming the right combination of students in a group promises to enable a powerful and effective environment for learning and collaboration. However, defining a group of students is a complex task which has to satisfy multiple constraints. This work introduces an unsupervised algorithm for fair and skill-diverse student group formation. This is achieved by taking account of student course marks and sensitive attributes provided by the education office. The skill sets of students are determined using unsupervised dimensionality reduction of course mark data via the Laplacian eigenmap. The problem is formulated as a constrained graph partitioning problem, whereby the diversity of skill sets in each group are maximised, group sizes are upper and lower bounded according to available resources, and `balance' of a sensitive attribute is lower bounded to enforce fairness in group formation. This optimisation problem is solved using integer programming and its effectiveness is demonstrated on a dataset of student course marks from Imperial College London.
OCMay 30, 2022
Last-iterate convergence analysis of stochastic momentum methods for neural networksDongpo Xu, Jinlan Liu, Yinghua Lu et al.
The stochastic momentum method is a commonly used acceleration technique for solving large-scale stochastic optimization problems in artificial neural networks. Current convergence results of stochastic momentum methods under non-convex stochastic settings mostly discuss convergence in terms of the random output and minimum output. To this end, we address the convergence of the last iterate output (called last-iterate convergence) of the stochastic momentum methods for non-convex stochastic optimization problems, in a way conformal with traditional optimization theory. We prove the last-iterate convergence of the stochastic momentum methods under a unified framework, covering both stochastic heavy ball momentum and stochastic Nesterov accelerated gradient momentum. The momentum factors can be fixed to be constant, rather than time-varying coefficients in existing analyses. Finally, the last-iterate convergence of the stochastic momentum methods is verified on the benchmark MNIST and CIFAR-10 datasets.
CLJan 29Code
KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual MatricesWuyang Zhou, Yuxuan Gu, Giorgos Iacovides et al.
The success of Hyper-Connections (HC) in neural networks (NN) has also highlighted issues related to its training instability and restricted scalability. The Manifold-Constrained Hyper-Connections (mHC) mitigate these challenges by projecting the residual connection space onto a Birkhoff polytope, however, it faces two issues: 1) its iterative Sinkhorn-Knopp (SK) algorithm does not always yield exact doubly stochastic residual matrices; 2) mHC incurs a prohibitive $\mathcal{O}(n^3C)$ parameter complexity with $n$ as the width of the residual stream and $C$ as the feature dimension. The recently proposed mHC-lite reparametrizes the residual matrix via the Birkhoff-von-Neumann theorem to guarantee double stochasticity, but also faces a factorial explosion in its parameter complexity, $\mathcal{O} \left( nC \cdot n! \right)$. To address both challenges, we propose \textbf{KromHC}, which uses the \underline{Kro}necker products of smaller doubly stochastic matrices to parametrize the residual matrix in \underline{mHC}. By enforcing manifold constraints across the factor residual matrices along each mode of the tensorized residual stream, KromHC guarantees exact double stochasticity of the residual matrices while reducing parameter complexity to $\mathcal{O}(n^2C)$. Comprehensive experiments demonstrate that KromHC matches or even outperforms state-of-the-art (SOTA) mHC variants, while requiring significantly fewer trainable parameters. The code is available at \texttt{https://github.com/wz1119/KromHC}.
LGDec 18, 2025
Coarse-to-Fine Open-Set Graph Node Classification with Large Language ModelsXueqi Ma, Xingjun Ma, Sarah Monazam Erfani et al.
Developing open-set classification methods capable of classifying in-distribution (ID) data while detecting out-of-distribution (OOD) samples is essential for deploying graph neural networks (GNNs) in open-world scenarios. Existing methods typically treat all OOD samples as a single class, despite real-world applications, especially high-stake settings such as fraud detection and medical diagnosis, demanding deeper insights into OOD samples, including their probable labels. This raises a critical question: can OOD detection be extended to OOD classification without true label information? To address this question, we propose a Coarse-to-Fine open-set Classification (CFC) framework that leverages large language models (LLMs) for graph datasets. CFC consists of three key components: a coarse classifier that uses LLM prompts for OOD detection and outlier label generation, a GNN-based fine classifier trained with OOD samples identified by the coarse classifier for enhanced OOD detection and ID classification, and refined OOD classification achieved through LLM prompts and post-processed OOD labels. Unlike methods that rely on synthetic or auxiliary OOD samples, CFC employs semantic OOD instances that are genuinely out-of-distribution based on their inherent meaning, improving interpretability and practical utility. Experimental results show that CFC improves OOD detection by ten percent over state-of-the-art methods on graph and text domains and achieves up to seventy percent accuracy in OOD classification on graph datasets.
MLSep 7, 2023
On the dynamics of multi agent nonlinear filtering and learningSayed Pouria Talebi, Danilo Mandic
Multiagent systems aim to accomplish highly complex learning tasks through decentralised consensus seeking dynamics and their use has garnered a great deal of attention in the signal processing and computational intelligence societies. This article examines the behaviour of multiagent networked systems with nonlinear filtering/learning dynamics. To this end, a general formulation for the actions of an agent in multiagent networked systems is presented and conditions for achieving a cohesive learning behaviour is given. Importantly, application of the so derived framework in distributed and federated learning scenarios are presented.
LGOct 24, 2023
Improving Diffusion Models for ECG Imputation with an Augmented Template PriorAlexander Jenkins, Zehua Chen, Fu Siong Ng et al.
Pulsative signals such as the electrocardiogram (ECG) are extensively collected as part of routine clinical care. However, noisy and poor-quality recordings are a major issue for signals collected using mobile health systems, decreasing the signal quality, leading to missing values, and affecting automated downstream tasks. Recent studies have explored the imputation of missing values in ECG with probabilistic time-series models. Nevertheless, in comparison with the deterministic models, their performance is still limited, as the variations across subjects and heart-beat relationships are not explicitly considered in the training objective. In this work, to improve the imputation and forecasting accuracy for ECG with probabilistic models, we present a template-guided denoising diffusion probabilistic model (DDPM), PulseDiff, which is conditioned on an informative prior for a range of health conditions. Specifically, 1) we first extract a subject-level pulsative template from the observed values to use as an informative prior of the missing values, which personalises the prior; 2) we then add beat-level stochastic shift terms to augment the prior, which considers variations in the position and amplitude of the prior at each beat; 3) we finally design a confidence score to consider the health condition of the subject, which ensures our prior is provided safely. Experiments with the PTBXL dataset reveal that PulseDiff improves the performance of two strong DDPM baseline models, CSDI and SSSD$^{S4}$, verifying that our method guides the generation of DDPMs while managing the uncertainty. When combined with SSSD$^{S4}$, PulseDiff outperforms the leading deterministic model for short-interval missing data and is comparable for long-interval data loss.
LGNov 9, 2022
Hyper-GST: Predict Metro Passenger Flow Incorporating GraphSAGE, Hypergraph, Social-meaningful Edge Weights and Temporal ExploitationYuyang Miao, Yao Xu, Danilo Mandic
Predicting metro passenger flow precisely is of great importance for dynamic traffic planning. Deep learning algorithms have been widely applied due to their robust performance in modelling non-linear systems. However, traditional deep learning algorithms completely discard the inherent graph structure within the metro system. Graph-based deep learning algorithms could utilise the graph structure but raise a few challenges, such as how to determine the weights of the edges and the shallow receptive field caused by the over-smoothing issue. To further improve these challenges, this study proposes a model based on GraphSAGE with an edge weights learner applied. The edge weights learner utilises socially meaningful features to generate edge weights. Hypergraph and temporal exploitation modules are also constructed as add-ons for better performance. A comparison study is conducted on the proposed algorithm and other state-of-art graph neural networks, where the proposed algorithm could improve the performance.
LGSep 19, 2024
CF-GO-Net: A Universal Distribution Learner via Characteristic Function Networks with Graph OptimizersZeyang Yu, Shengxi Li, Danilo Mandic
Generative models aim to learn the distribution of datasets, such as images, so as to be able to generate samples that statistically resemble real data. However, learning the underlying probability distribution can be very challenging and intractable. To this end, we introduce an approach which employs the characteristic function (CF), a probabilistic descriptor that directly corresponds to the distribution. However, unlike the probability density function (pdf), the characteristic function not only always exists, but also provides an additional degree of freedom, hence enhances flexibility in learning distributions. This removes the critical dependence on pdf-based assumptions, which limit the applicability of traditional methods. While several works have attempted to use CF in generative modeling, they often impose strong constraints on the training process. In contrast, our approach calculates the distance between query points in the CF domain, which is an unconstrained and well defined problem. Next, to deal with the sampling strategy, which is crucial to model performance, we propose a graph neural network (GNN)-based optimizer for the sampling process, which identifies regions where the difference between CFs is most significant. In addition, our method allows the use of a pre-trained model, such as a well-trained autoencoder, and is capable of learning directly in its feature space, without modifying its parameters. This offers a flexible and robust approach to generative modeling, not only provides broader applicability and improved performance, but also equips any latent space world with the ability to become a generative model.
LGDec 25, 2025
RefineBridge: Generative Bridge Models Improve Financial Forecasting by Foundation ModelsAnthony Bolton, Wuyang Zhou, Zehua Chen et al.
Financial time series forecasting is particularly challenging for transformer-based time series foundation models (TSFMs) due to non-stationarity, heavy-tailed distributions, and high-frequency noise present in data. Low-rank adaptation (LoRA) has become a popular parameter-efficient method for adapting pre-trained TSFMs to downstream data domains. However, it still underperforms in financial data, as it preserves the network architecture and training objective of TSFMs rather than complementing the foundation model. To further enhance TSFMs, we propose a novel refinement module, RefineBridge, built upon a tractable Schrödinger Bridge (SB) generative framework. Given the forecasts of TSFM as generative prior and the observed ground truths as targets, RefineBridge learns context-conditioned stochastic transport maps to improve TSFM predictions, iteratively approaching the ground-truth target from even a low-quality prior. Simulations on multiple financial benchmarks demonstrate that RefineBridge consistently improves the performance of state-of-the-art TSFMs across different prediction horizons.
30.2LGMay 7
Enabling Unsupervised Training of Deep EEG Denoisers With Intelligent PartitioningQiyu Rao, Haozhe Tian, Homayoun Hamedmoghadam et al.
Denoising wearable electroencephalogram (EEG) is inherently challenging since neural activity is not only subtle but also inseparable from spectrally overlapping noise artifacts. Classical signal processing methods, relying on fixed or heuristic rules, cannot handle the time-varying pervasive artifacts in wearable EEGs. Deep learning methods, on the other hand, show promise in decomposition-free EEG denoising using highly expressive neural networks, but the training requires artifact-free EEG, which is inherently unobtainable. To address this, we propose Intelligent Partitioning for Self-supervised Denoising (iPSD). Our method eliminates the need for clean references by learning to partition an input EEG segment into independent noisy realizations with the same underlying signal. This enables self-supervision of deep learning denoisers, even in zero-shot settings where only a single EEG segment to be denoised is available. We validate iPSD through extensive experiments, including validations on wearable EEG from in-ear sensors. The results show that iPSD achieves state-of-the-art performance, most notably under extremely low signal-to-noise ratios (down to -10 dB) and challenging artifacts (e.g., EMG), with spectral fidelity orders of magnitude higher than competitive baselines.
CLMar 18, 2024
FinLlama: Financial Sentiment Classification for Algorithmic Trading ApplicationsThanos Konstantinidis, Giorgos Iacovides, Mingxue Xu et al.
There are multiple sources of financial news online which influence market movements and trader's decisions. This highlights the need for accurate sentiment analysis, in addition to having appropriate algorithmic trading techniques, to arrive at better informed trading decisions. Standard lexicon based sentiment approaches have demonstrated their power in aiding financial decisions. However, they are known to suffer from issues related to context sensitivity and word ordering. Large Language Models (LLMs) can also be used in this context, but they are not finance-specific and tend to require significant computational resources. To facilitate a finance specific LLM framework, we introduce a novel approach based on the Llama 2 7B foundational model, in order to benefit from its generative nature and comprehensive language manipulation. This is achieved by fine-tuning the Llama2 7B model on a small portion of supervised financial sentiment analysis data, so as to jointly handle the complexities of financial lexicon and context, and further equipping it with a neural network based decision mechanism. Such a generator-classifier scheme, referred to as FinLlama, is trained not only to classify the sentiment valence but also quantify its strength, thus offering traders a nuanced insight into financial news articles. Complementing this, the implementation of parameter-efficient fine-tuning through LoRA optimises trainable parameters, thus minimising computational and memory requirements, without sacrificing accuracy. Simulation results demonstrate the ability of the proposed FinLlama to provide a framework for enhanced portfolio management decisions and increased market returns. These results underpin the ability of FinLlama to construct high-return portfolios which exhibit enhanced resilience, even during volatile periods and unpredictable market events.
LGFeb 13, 2025
Relational Conformal Prediction for Correlated Time SeriesAndrea Cini, Alexander Jenkins, Danilo Mandic et al.
We address the problem of uncertainty quantification in time series forecasting by exploiting observations at correlated sequences. Relational deep learning methods leveraging graph representations are among the most effective tools for obtaining point estimates from spatiotemporal data and correlated time series. However, the problem of exploiting relational structures to estimate the uncertainty of such predictions has been largely overlooked in the same context. To this end, we propose a novel distribution-free approach based on the conformal prediction framework and quantile regression. Despite the recent applications of conformal prediction to sequential data, existing methods operate independently on each target time series and do not account for relationships among them when constructing the prediction interval. We fill this void by introducing a novel conformal prediction method based on graph deep learning operators. Our approach, named Conformal Relational Prediction (CoRel), does not require the relational structure (graph) to be known a priori and can be applied on top of any pre-trained predictor. Additionally, CoRel includes an adaptive component to handle non-exchangeable data and changes in the input time series. Our approach provides accurate coverage and achieves state-of-the-art uncertainty quantification in relevant benchmarks.
LGOct 14, 2024
Towards LLM-guided Efficient and Interpretable Multi-linear Tensor Network Rank SelectionGiorgos Iacovides, Wuyang Zhou, Danilo Mandic
We propose a novel framework that leverages large language models (LLMs) to guide the rank selection in tensor network models for higher-order data analysis. By utilising the intrinsic reasoning capabilities and domain knowledge of LLMs, our approach offers enhanced interpretability of the rank choices and can effectively optimise the objective function. This framework enables users without specialised domain expertise to utilise tensor network decompositions and understand the underlying rationale within the rank selection process. Experimental results validate our method on financial higher-order datasets, demonstrating interpretable reasoning, strong generalisation to unseen test data, and its potential for self-enhancement over successive iterations. This work is placed at the intersection of large language models and higher-order data analysis.
CLJul 24, 2025
FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMsGiorgos Iacovides, Wuyang Zhou, Danilo Mandic
Opinions expressed in online finance-related textual data are having an increasingly profound impact on trading decisions and market movements. This trend highlights the vital role of sentiment analysis as a tool for quantifying the nature and strength of such opinions. With the rapid development of Generative AI (GenAI), supervised fine-tuned (SFT) large language models (LLMs) have become the de facto standard for financial sentiment analysis. However, the SFT paradigm can lead to memorization of the training data and often fails to generalize to unseen samples. This is a critical limitation in financial domains, where models must adapt to previously unobserved events and the nuanced, domain-specific language of finance. To this end, we introduce FinDPO, the first finance-specific LLM framework based on post-training human preference alignment via Direct Preference Optimization (DPO). The proposed FinDPO achieves state-of-the-art performance on standard sentiment classification benchmarks, outperforming existing supervised fine-tuned models by 11% on the average. Uniquely, the FinDPO framework enables the integration of a fine-tuned causal LLM into realistic portfolio strategies through a novel 'logit-to-score' conversion, which transforms discrete sentiment predictions into continuous, rankable sentiment scores (probabilities). In this way, simulations demonstrate that FinDPO is the first sentiment-based approach to maintain substantial positive returns of 67% annually and strong risk-adjusted performance, as indicated by a Sharpe ratio of 2.0, even under realistic transaction costs of 5 basis points (bps).
LGMay 29, 2025
Domain-Aware Tensor Network Structure SearchGiorgos Iacovides, Wuyang Zhou, Chao Li et al.
Tensor networks (TNs) provide efficient representations of high-dimensional data, yet identification of the optimal TN structures, the so called tensor network structure search (TN-SS) problem, remains a challenge. Current state-of-the-art (SOTA) algorithms solve TN-SS as a purely numerical optimization problem and require extensive function evaluations, which is prohibitive for real-world applications. In addition, existing methods ignore the valuable domain information inherent in real-world tensor data and lack transparency in their identified TN structures. To this end, we propose a novel TN-SS framework, termed the tnLLM, which incorporates domain information about the data and harnesses the reasoning capabilities of large language models (LLMs) to directly predict suitable TN structures. The proposed framework involves a domain-aware prompting pipeline which instructs the LLM to infer suitable TN structures based on the real-world relationships between tensor modes. In this way, our approach is capable of not only iteratively optimizing the objective function, but also generating domain-aware explanations for the identified structures. Experimental results demonstrate that tnLLM achieves comparable TN-SS objective function values with much fewer function evaluations compared to SOTA algorithms. Furthermore, we demonstrate that the LLM-enabled domain information can be used to find good initializations in the search space for sampling-based SOTA methods to accelerate their convergence while preserving theoretical performance guarantees.
CLJan 26, 2025
TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMsYuxuan Gu, Wuyang Zhou, Giorgos Iacovides et al.
The reasoning abilities of Large Language Models (LLMs) can be improved by structurally denoising their weights, yet existing techniques primarily focus on denoising the feed-forward network (FFN) of the transformer block, and can not efficiently utilise the Multi-head Attention (MHA) block, which is the core of transformer architectures. To address this issue, we propose a novel intuitive framework that, at its very core, performs MHA compression through a multi-head tensorisation process and the Tucker decomposition. This enables both higher-dimensional structured denoising and compression of the MHA weights, by enforcing a shared higher-dimensional subspace across the weights of the multiple attention heads. We demonstrate that this approach consistently enhances the reasoning capabilities of LLMs across multiple benchmark datasets, and for both encoder-only and decoder-only architectures, while achieving compression rates of up to $\sim 250$ times in the MHA weights, all without requiring any additional data, training, or fine-tuning. Furthermore, we show that the proposed method can be seamlessly combined with existing FFN-only-based denoising techniques to achieve further improvements in LLM reasoning performance.
SPMar 25, 2025
A Systematic Review of EEG-based Machine Intelligence Algorithms for Depression Diagnosis, and MonitoringAmir Nassibi, Christos Papavassiliou, Ildar Rakhmatulin et al.
Depression disorder is a serious health condition that has affected the lives of millions of people around the world. Diagnosis of depression is a challenging practice that relies heavily on subjective studies and, in most cases, suffers from late findings. Electroencephalography (EEG) biomarkers have been suggested and investigated in recent years as a potential transformative objective practice. In this article, for the first time, a detailed systematic review of EEG-based depression diagnosis approaches is conducted using advanced machine learning techniques and statistical analyses. For this, 938 potentially relevant articles (since 1985) were initially detected and filtered into 139 relevant articles based on the review scheme 'preferred reporting items for systematic reviews and meta-analyses (PRISMA).' This article compares and discusses the selected articles and categorizes them according to the type of machine learning techniques and statistical analyses. Algorithms, preprocessing techniques, extracted features, and data acquisition systems are discussed and summarized. This review paper explains the existing challenges of the current algorithms and sheds light on the future direction of the field. This systematic review outlines the issues and challenges in machine intelligence for the diagnosis of EEG depression that can be addressed in future studies and possibly in future wearable technologies.
LGSep 3, 2025
TeRA: Vector-based Random Tensor Network for High-Rank Adaptation of Large Language ModelsYuxuan Gu, Wuyang Zhou, Giorgos Iacovides et al.
Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), have significantly reduced the number of trainable parameters needed in fine-tuning large language models (LLMs). Subsequent developments of LoRA-style adapters have diverged into two main directions: (1) enhancing model expressivity with high-rank adapters, and (2) pushing for further parameter reduction, as exemplified by vector-based methods. However, these approaches present a trade-off, as achieving the expressivity of high-rank weight updates typically comes at the cost of sacrificing the extreme parameter efficiency offered by vector-based techniques. To address this issue, we propose a vector-based random \underline{\textbf{Te}}nsor network for high-\underline{\textbf{R}}ank \underline{\textbf{A}}daptation (TeRA), a novel PEFT method that achieves high-rank weight updates while retaining the parameter efficiency of vector-based PEFT adapters. This is achieved by parameterizing the tensorized weight update matrix as a Tucker-like tensor network (TN), in which large randomly initialized factors are frozen and shared across layers, while only small layer-specific scaling vectors, formed by entries in diagonal factor matrices, are trained. This design effectively decouples the rank of the weight update matrix from the number of trainable parameters. Comprehensive experiments demonstrate that TeRA matches or even outperforms high-rank adapters, while requiring a trainable parameter count similar to vector-based methods. Theoretical analysis and ablation studies further validate the effectiveness of our approach.
SPAug 29, 2025
Machine Intelligence on the Edge: Interpretable Cardiac Pattern Localisation Using Reinforcement LearningHaozhe Tian, Qiyu Rao, Nina Moutonnet et al.
Matched filters are widely used to localise signal patterns due to their high efficiency and interpretability. However, their effectiveness deteriorates for low signal-to-noise ratio (SNR) signals, such as those recorded on edge devices, where prominent noise patterns can closely resemble the target within the limited length of the filter. One example is the ear-electrocardiogram (ear-ECG), where the cardiac signal is attenuated and heavily corrupted by artefacts. To address this, we propose the Sequential Matched Filter (SMF), a paradigm that replaces the conventional single matched filter with a sequence of filters designed by a Reinforcement Learning agent. By formulating filter design as a sequential decision-making process, SMF adaptively design signal-specific filter sequences that remain fully interpretable by revealing key patterns driving the decision-making. The proposed SMF framework has strong potential for reliable and interpretable clinical decision support, as demonstrated by its state-of-the-art R-peak detection and physiological state classification performance on two challenging real-world ECG datasets. The proposed formulation can also be extended to a broad range of applications that require accurate pattern localisation from noise-corrupted signals.
LGJul 14, 2025
Understanding the Rank of Tensor Networks via an Intuitive Example-Driven ApproachWuyang Zhou, Giorgos Iacovides, Kriton Konstantinidis et al.
Tensor Network (TN) decompositions have emerged as an indispensable tool in Big Data analytics owing to their ability to provide compact low-rank representations, thus alleviating the ``Curse of Dimensionality'' inherent in handling higher-order data. At the heart of their success lies the concept of TN ranks, which governs the efficiency and expressivity of TN decompositions. However, unlike matrix ranks, TN ranks often lack a universal meaning and an intuitive interpretation, with their properties varying significantly across different TN structures. Consequently, TN ranks are frequently treated as empirically tuned hyperparameters, rather than as key design parameters inferred from domain knowledge. The aim of this Lecture Note is therefore to demystify the foundational yet frequently misunderstood concept of TN ranks through real-life examples and intuitive visualizations. We begin by illustrating how domain knowledge can guide the selection of TN ranks in widely-used models such as the Canonical Polyadic (CP) and Tucker decompositions. For more complex TN structures, we employ a self-explanatory graphical approach that generalizes to tensors of arbitrary order. Such a perspective naturally reveals the relationship between TN ranks and the corresponding ranks of tensor unfoldings (matrices), thereby circumventing cumbersome multi-index tensor algebra while facilitating domain-informed TN design. It is our hope that this Lecture Note will equip readers with a clear and unified understanding of the concept of TN rank, along with the necessary physical insight and intuition to support the selection, explainability, and deployment of tensor methods in both practical applications and educational contexts.
LGFeb 13, 2025
Learning to Predict Global Atrial Fibrillation Dynamics from Sparse MeasurementsAlexander Jenkins, Andrea Cini, Joseph Barker et al.
Catheter ablation of Atrial Fibrillation (AF) consists of a one-size-fits-all treatment with limited success in persistent AF. This may be due to our inability to map the dynamics of AF with the limited resolution and coverage provided by sequential contact mapping catheters, preventing effective patient phenotyping for personalised, targeted ablation. Here we introduce FibMap, a graph recurrent neural network model that reconstructs global AF dynamics from sparse measurements. Trained and validated on 51 non-contact whole atria recordings, FibMap reconstructs whole atria dynamics from 10% surface coverage, achieving a 210% lower mean absolute error and an order of magnitude higher performance in tracking phase singularities compared to baseline methods. Clinical utility of FibMap is demonstrated on real-world contact mapping recordings, achieving reconstruction fidelity comparable to non-contact mapping. FibMap's state-spaces and patient-specific parameters offer insights for electrophenotyping AF. Integrating FibMap into clinical practice could enable personalised AF care and improve outcomes.
SPNov 3, 2024
Online Graph Topology Learning via Time-Vertex Adaptive Filters: From Theory to Cardiac FibrillationAlexander Jenkins, Thiernithi Variddhisai, Ahmed El-Medany et al.
Graph Signal Processing (GSP) provides a powerful framework for analysing complex, interconnected systems by modelling data as signals on graphs. While recent advances have enabled graph topology learning from observed signals, existing methods often struggle with time-varying systems and real-time applications. To address this gap, we introduce AdaCGP, a sparsity-aware adaptive algorithm for dynamic graph topology estimation from multivariate time series. AdaCGP estimates the Graph Shift Operator (GSO) through recursive update formulae designed to address sparsity, shift-invariance, and bias. Through comprehensive simulations, we demonstrate that AdaCGP consistently outperforms multiple baselines across diverse graph topologies, achieving improvements exceeding 83% in GSO estimation compared to state-of-the-art methods while maintaining favourable computational scaling properties. Our variable splitting approach enables reliable identification of causal connections with near-zero false alarm rates and minimal missed edges. Applied to cardiac fibrillation recordings, AdaCGP tracks dynamic changes in propagation patterns more effectively than established methods like Granger causality, capturing temporal variations in graph topology that static approaches miss. The algorithm successfully identifies stability characteristics in conduction patterns that may maintain arrhythmias, demonstrating potential for clinical applications in diagnosis and treatment of complex biomedical systems.
SPApr 8, 2024
Clinical translation of machine learning algorithms for seizure detection in scalp electroencephalography: systematic reviewNina Moutonnet, Steven White, Benjamin P Campbell et al.
Machine learning algorithms for seizure detection have shown considerable diagnostic potential, with recent reported accuracies reaching 100%. Yet, only few published algorithms have fully addressed the requirements for successful clinical translation. This is, for example, because the properties of training data may limit the generalisability of algorithms, algorithm performance may vary depending on which electroencephalogram (EEG) acquisition hardware was used, or run-time processing costs may be prohibitive to real-time clinical use cases. To address these issues in a critical manner, we systematically review machine learning algorithms for seizure detection with a focus on clinical translatability, assessed by criteria including generalisability, run-time costs, explainability, and clinically-relevant performance metrics. For non-specialists, the domain-specific knowledge necessary to contextualise model development and evaluation is provided. It is our hope that such critical evaluation of machine learning algorithms with respect to their potential real-world effectiveness can help accelerate clinical translation and identify gaps in the current seizure detection literature.
LGMay 30, 2023
Graph-based Time Series Clustering for End-to-End Hierarchical ForecastingAndrea Cini, Danilo Mandic, Cesare Alippi
Relationships among time series can be exploited as inductive biases in learning effective forecasting models. In hierarchical time series, relationships among subsets of sequences induce hard constraints (hierarchical inductive biases) on the predicted values. In this paper, we propose a graph-based methodology to unify relational and hierarchical inductive biases in the context of deep learning for time series forecasting. In particular, we model both types of relationships as dependencies in a pyramidal graph structure, with each pyramidal layer corresponding to a level of the hierarchy. By exploiting modern - trainable - graph pooling operators we show that the hierarchical structure, if not available as a prior, can be learned directly from data, thus obtaining cluster assignments aligned with the forecasting objective. A differentiable reconciliation stage is incorporated into the processing architecture, allowing hierarchical constraints to act both as an architectural bias as well as a regularization element for predictions. Simulation results on representative datasets show that the proposed method compares favorably against the state of the art.
ASFeb 8, 2022
InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in TrainingZehua Chen, Xu Tan, Ke Wang et al.
Denoising diffusion probabilistic models (diffusion models for short) require a large number of iterations in inference to achieve the generation quality that matches or surpasses the state-of-the-art generative models, which invariably results in slow inference speed. Previous approaches aim to optimize the choice of inference schedule over a few iterations to speed up inference. However, this results in reduced generation quality, mainly because the inference process is optimized separately, without jointly optimizing with the training process. In this paper, we propose InferGrad, a diffusion model for vocoder that incorporates inference process into training, to reduce the inference iterations while maintaining high generation quality. More specifically, during training, we generate data from random noise through a reverse process under inference schedules with a few iterations, and impose a loss to minimize the gap between the generated and ground-truth data samples. Then, unlike existing approaches, the training of InferGrad considers the inference process. The advantages of InferGrad are demonstrated through experiments on the LJSpeech dataset showing that InferGrad achieves better voice quality than the baseline WaveGrad under same conditions while maintaining the same voice quality as the baseline but with $3$x speedup ($2$ iterations for InferGrad vs $6$ iterations for WaveGrad).
ITAug 26, 2021
Convolutional Neural Networks Demystified: A Matched Filtering Perspective Based TutorialLjubisa Stankovic, Danilo Mandic
Deep Neural Networks (DNN) and especially Convolutional Neural Networks (CNN) are a de-facto standard for the analysis of large volumes of signals and images. Yet, their development and underlying principles have been largely performed in an ad-hoc and black box fashion. To help demystify CNNs, we revisit their operation from first principles and a matched filtering perspective. We establish that the convolution operation within CNNs, their very backbone, represents a matched filter which examines the input signal/image for the presence of pre-defined features. This perspective is shown to be physically meaningful, and serves as a basis for a step-by-step tutorial on the operation of CNNs, including pooling, zero padding, various ways of dimensionality reduction. Starting from first principles, both the feed-forward pass and the learning stage (via back-propagation) are illuminated in detail, both through a worked-out numerical example and the corresponding visualizations. It is our hope that this tutorial will help shed new light and physical intuition into the understanding and further development of deep neural networks.
LGAug 23, 2021
Understanding the Basis of Graph Convolutional Neural Networks via an Intuitive Matched Filtering ApproachLjubisa Stankovic, Danilo Mandic
Graph Convolutional Neural Networks (GCNN) are becoming a preferred model for data processing on irregular domains, yet their analysis and principles of operation are rarely examined due to the black box nature of NNs. To this end, we revisit the operation of GCNNs and show that their convolution layers effectively perform matched filtering of input data with the chosen patterns (features). This allows us to provide a unifying account of GCNNs through a matched filter perspective, whereby the nonlinear ReLU and max-pooling layers are also discussed within the matched filtering framework. This is followed by a step-by-step guide on information propagation and learning in GCNNs. It is also shown that standard CNNs and fully connected NNs can be obtained as a special case of GCNNs. A carefully chosen numerical example guides the reader through the various steps of GCNN operation and learning both visually and numerically.
MLMar 14, 2021
Von Mises-Fisher Elliptical DistributionShengxi Li, Danilo Mandic
A large class of modern probabilistic learning systems assumes symmetric distributions, however, real-world data tend to obey skewed distributions and are thus not always adequately modelled through symmetric distributions. To address this issue, elliptical distributions are increasingly used to generalise symmetric distributions, and further improvements to skewed elliptical distributions have recently attracted much attention. However, existing approaches are either hard to estimate or have complicated and abstract representations. To this end, we propose to employ the von-Mises-Fisher (vMF) distribution to obtain an explicit and simple probability representation of the skewed elliptical distribution. This is shown not only to allow us to deal with non-symmetric learning systems, but also to provide a physically meaningful way of generalising skewed distributions. For rigour, our extension is proved to share important and desirable properties with its symmetric counterpart. We also demonstrate that the proposed vMF distribution is both easy to generate and stable to estimate, both theoretically and through examples.
ITMar 11, 2021
Improved Coherence Index-Based Bound in Compressive SensingLjubisa Stankovic, Milos Brajovic, Danilo Mandic et al.
Within the Compressive Sensing (CS) paradigm, sparse signals can be reconstructed based on a reduced set of measurements. Reliability of the solution is determined by the uniqueness condition. With its mathematically tractable and feasible calculation, coherence index is one of very few CS metrics with a considerable practical importance. In this paper, we propose an improvement of the coherence based uniqueness relation for the matching pursuit algorithms. Starting from a simple and intuitive derivation of the standard uniqueness condition based on the coherence index, we derive a less conservative coherence index-based lower bound for signal sparsity. The results are generalized to the uniqueness condition of the $l_0$-norm minimization for a signal represented in two orthonormal bases.
LGDec 11, 2020
A Review of Hidden Markov Models and Recurrent Neural Networks for Event Detection and Localization in Biomedical SignalsYassin Khalifa, Danilo Mandic, Ervin Sejdić
Biomedical signals carry signature rhythms of complex physiological processes that control our daily bodily activity. The properties of these rhythms indicate the nature of interaction dynamics among physiological processes that maintain a homeostasis. Abnormalities associated with diseases or disorders usually appear as disruptions in the structure of the rhythms which makes isolating these rhythms and the ability to differentiate between them, indispensable. Computer aided diagnosis systems are ubiquitous nowadays in almost every medical facility and more closely in wearable technology, and rhythm or event detection is the first of many intelligent steps that they perform. How these rhythms are isolated? How to develop a model that can describe the transition between processes in time? Many methods exist in the literature that address these questions and perform the decoding of biomedical signals into separate rhythms. In here, we demystify the most effective methods that are used for detection and isolation of rhythms or events in time series and highlight the way in which they were applied to different biomedical signals and how they contribute to information fusion. The key strengths and limitations of these methods are also discussed as well as the challenges encountered with application in biomedical signals.
LGJun 15, 2020
Reciprocal Adversarial Learning via Characteristic FunctionsShengxi Li, Zeyang Yu, Min Xiang et al.
Generative adversarial nets (GANs) have become a preferred tool for tasks involving complicated distributions. To stabilise the training and reduce the mode collapse of GANs, one of their main variants employs the integral probability metric (IPM) as the loss function. This provides extensive IPM-GANs with theoretical support for basically comparing moments in an embedded domain of the \textit{critic}. We generalise this by comparing the distributions rather than their moments via a powerful tool, i.e., the characteristic function (CF), which uniquely and universally comprising all the information about a distribution. For rigour, we first establish the physical meaning of the phase and amplitude in CF, and show that this provides a feasible way of balancing the accuracy and diversity of generation. We then develop an efficient sampling strategy to calculate the CFs. Within this framework, we further prove an equivalence between the embedded and data domains when a reciprocal exists, where we naturally develop the GAN in an auto-encoder structure, in a way of comparing everything in the embedded space (a semantically meaningful manifold). This efficient structure uses only two modules, together with a simple training strategy, to achieve bi-directionally generating clear images, which is referred to as the reciprocal CF GAN (RCF-GAN). Experimental results demonstrate the superior performances of the proposed RCF-GAN in terms of both generation and reconstruction.
SPMar 10, 2020
Methods of Adaptive Signal Processing on Graphs Using Vertex-Time Autoregressive ModelsThiernithi Variddhisai, Danilo Mandic
The concept of a random process has been recently extended to graph signals, whereby random graph processes are a class of multivariate stochastic processes whose coefficients are matrices with a \textit{graph-topological} structure. The system identification problem of a random graph process therefore revolves around determining its underlying topology, or mathematically, the graph shift operators (GSOs) i.e. an adjacency matrix or a Laplacian matrix. In the same work that introduced random graph processes, a \textit{batch} optimization method to solve for the GSO was also proposed for the random graph process based on a \textit{causal} vertex-time autoregressive model. To this end, the online version of this optimization problem was proposed via the framework of adaptive filtering. The modified stochastic gradient projection method was employed on the regularized least squares objective to create the filter. The recursion is divided into 3 regularized sub-problems to address issues like multi-convexity, sparsity, commutativity and bias. A discussion on convergence analysis is also included. Finally, experiments are conducted to illustrate the performance of the proposed algorithm, from traditional MSE measure to successful recovery rate regardless correct values, all of which to shed light on the potential, the limit and the possible research attempt of this work.
ITJan 2, 2020
Graph Signal Processing -- Part III: Machine Learning on Graphs, from Graph Topology to ApplicationsLjubisa Stankovic, Danilo Mandic, Milos Dakovic et al.
Many modern data analytics applications on graphs operate on domains where graph topology is not known a priori, and hence its determination becomes part of the problem definition, rather than serving as prior knowledge which aids the problem solution. Part III of this monograph starts by addressing ways to learn graph topology, from the case where the physics of the problem already suggest a possible topology, through to most general cases where the graph topology is learned from the data. A particular emphasis is on graph topology definition based on the correlation and precision matrices of the observed data, combined with additional prior knowledge and structural conditions, such as the smoothness or sparsity of graph connections. For learning sparse graphs (with small number of edges), the least absolute shrinkage and selection operator, known as LASSO is employed, along with its graph specific variant, graphical LASSO. For completeness, both variants of LASSO are derived in an intuitive way, and explained. An in-depth elaboration of the graph topology learning paradigm is provided through several examples on physically well defined graphs, such as electric circuits, linear heat transfer, social and computer networks, and spring-mass systems. As many graph neural networks (GNN) and convolutional graph networks (GCN) are emerging, we have also reviewed the main trends in GNNs and GCNs, from the perspective of graph signal filtering. Tensor representation of lattice-structured graphs is next considered, and it is shown that tensors (multidimensional data arrays) are a special class of graph signals, whereby the graph vertices reside on a high-dimensional regular lattice structure. This part of monograph concludes with two emerging applications in financial data processing and underground transportation networks modeling.
LGJun 9, 2019
Solving general elliptical mixture models through an approximate Wasserstein manifoldShengxi Li, Zeyang Yu, Min Xiang et al.
We address the estimation problem for general finite mixture models, with a particular focus on the elliptical mixture models (EMMs). Compared to the widely adopted Kullback-Leibler divergence, we show that the Wasserstein distance provides a more desirable optimisation space. We thus provide a stable solution to the EMMs that is both robust to initialisations and reaches a superior optimum by adaptively optimising along a manifold of an approximate Wasserstein distance. To this end, we first provide a unifying account of computable and identifiable EMMs, which serves as a basis to rigorously address the underpinning optimisation problem. Due to a probability constraint, solving this problem is extremely cumbersome and unstable, especially under the Wasserstein distance. To relieve this issue, we introduce an efficient optimisation method on a statistical manifold defined under an approximate Wasserstein distance, which allows for explicit metrics and computable operations, thus significantly stabilising and improving the EMM estimation. We further propose an adaptive method to accelerate the convergence. Experimental results demonstrate the excellent performance of the proposed EMM solver.
NEMar 5, 2019
Widely Linear Complex-valued Autoencoder: Dealing with Noncircularity in Generative-Discriminative ModelsZeyang Yu, Shengxi Li, Danilo Mandic
We propose a new structure for the complex-valued autoencoder by introducing additional degrees of freedom into its design through a widely linear (WL) transform. The corresponding widely linear backpropagation algorithm is also developed using the $\mathbb{CR}$ calculus, to unify the gradient calculation of the cost function and the underlying WL model. More specifically, all the existing complex-valued autoencoders employ the strictly linear transform, which is optimal only when the complex-valued outputs of each network layer are independent of the conjugate of the inputs. In addition, the widely linear model which underpins our work allows us to consider all the second-order statistics of inputs. This provides more freedom in the design and enhanced optimization opportunities, as compared to the state-of-the-art. Furthermore, we show that the most widely adopted cost function, i.e., the mean squared error, is not best suited for the complex domain, as it is a real quantity with a single degree of freedom, while both the phase and the amplitude information need to be optimized. To resolve this issue, we design a new cost function, which is capable of controlling the balance between the phase and the amplitude contribution to the solution. The experimental results verify the superior performance of the proposed autoencoder together with the new cost function, especially for the imaging scenarios where the phase preserves extensive information on edges and shapes.
LGSep 7, 2018
Tensor Ring Decomposition with Rank Minimization on Latent Space: An Efficient Approach for Tensor CompletionLonghao Yuan, Chao Li, Danilo Mandic et al.
In tensor completion tasks, the traditional low-rank tensor decomposition models suffer from the laborious model selection problem due to their high model sensitivity. In particular, for tensor ring (TR) decomposition, the number of model possibilities grows exponentially with the tensor order, which makes it rather challenging to find the optimal TR decomposition. In this paper, by exploiting the low-rank structure of the TR latent space, we propose a novel tensor completion method which is robust to model selection. In contrast to imposing the low-rank constraint on the data space, we introduce nuclear norm regularization on the latent TR factors, resulting in the optimization step using singular value decomposition (SVD) being performed at a much smaller scale. By leveraging the alternating direction method of multipliers (ADMM) scheme, the latent TR factors with optimal rank and the recovered tensor can be obtained simultaneously. Our proposed algorithm is shown to effectively alleviate the burden of TR-rank selection, thereby greatly reducing the computational cost. The extensive experimental results on both synthetic and real-world data demonstrate the superior performance and efficiency of the proposed approach against the state-of-the-art algorithms.
NASep 3, 2018
Tensor Networks for Latent Variable Analysis: Higher Order Canonical Polyadic DecompositionAnh-Huy Phan, Andrzej Cichocki, Ivan Oseledets et al.
The Canonical Polyadic decomposition (CPD) is a convenient and intuitive tool for tensor factorization; however, for higher-order tensors, it often exhibits high computational cost and permutation of tensor entries, these undesirable effects grow exponentially with the tensor order. Prior compression of tensor in-hand can reduce the computational cost of CPD, but this is only applicable when the rank $R$ of the decomposition does not exceed the tensor dimensions. To resolve these issues, we present a novel method for CPD of higher-order tensors, which rests upon a simple tensor network of representative inter-connected core tensors of orders not higher than 3. For rigour, we develop an exact conversion scheme from the core tensors to the factor matrices in CPD, and an iterative algorithm with low complexity to estimate these factor matrices for the inexact case. Comprehensive simulations over a variety of scenarios support the approach.
NAMay 22, 2018
Rank Minimization on Tensor Ring: A New Paradigm in Scalable Tensor Decomposition and CompletionLonghao Yuan, Chao Li, Danilo Mandic et al.
In low-rank tensor completion tasks, due to the underlying multiple large-scale singular value decomposition (SVD) operations and rank selection problem of the traditional methods, they suffer from high computational cost and high sensitivity of model complexity. In this paper, taking advantages of high compressibility of the recently proposed tensor ring (TR) decomposition, we propose a new model for tensor completion problem. This is achieved through introducing convex surrogates of tensor low-rank assumption on latent tensor ring factors, which makes it possible for the Schatten norm regularization based models to be solved at much smaller scale. We propose two algorithms which apply different structured Schatten norms on tensor ring factors respectively. By the alternating direction method of multipliers (ADMM) scheme, the tensor ring factors and the predicted tensor can be optimized simultaneously. The experiments on synthetic data and real-world data show the high performance and efficiency of the proposed approach.
LGMay 21, 2018
A universal framework for learning the elliptical mixture modelShengxi Li, Zeyang Yu, Danilo Mandic
Mixture modelling using elliptical distributions promises enhanced robustness, flexibility and stability over the widely employed Gaussian mixture model (GMM). However, existing studies based on the elliptical mixture model (EMM) are restricted to several specific types of elliptical probability density functions, which are not supported by general solutions or systematic analysis frameworks; this significantly limits the rigour and the power of EMMs in applications. To this end, we propose a novel general framework for estimating and analysing the EMMs, achieved through Riemannian manifold optimisation. First, we investigate the relationships between Riemannian manifolds and elliptical distributions, and the so established connection between the original manifold and a reformulated one indicates a mismatch between those manifolds, the major cause of failure of the existing optimisation for solving general EMMs. We next propose a universal solver which is based on the optimisation of a re-designed cost and prove the existence of the same optimum as in the original problem; this is achieved in a simple, fast and stable way. We further calculate the influence functions of the EMM as theoretical bounds to quantify robustness to outliers. Comprehensive numerical results demonstrate the ability of the proposed framework to accommodate EMMs with different properties of individual functions in a stable way and with fast convergence speed. Finally, the enhanced robustness and flexibility of the proposed framework over the standard GMM are demonstrated both analytically and through comprehensive simulations.
LGMar 7, 2017
Online Multilinear Dictionary LearningThiernithi Variddhisai, Danilo Mandic
A method for online tensor dictionary learning is proposed. With the assumption of separable dictionaries, tensor contraction is used to diminish a $N$-way model of $\mathcal{O}\left(L^N\right)$ into a simple matrix equation of $\mathcal{O}\left(NL^2\right)$ with a real-time capability. To avoid numerical instability due to inversion of sparse matrix, a class of stochastic gradient with memory is formulated via a least-square solution to guarantee convergence and robustness. Both gradient descent with exact line search and Newton's method are discussed and realized. Extensions onto how to deal with bad initialization and outliers are also explained in detail. Experiments on two synthetic signals confirms an impressive performance of our proposed method.