Sergios Theodoridis

h-index49

26papers

524citations

Novelty51%

AI Score50

Ranked #42,700 of 205,806 authors (top 21%)#9,668 in LG (top 23%)

26 Papers

LGDec 25, 2025Code

When Bayesian Tensor Completion Meets Multioutput Gaussian Processes: Functional Universality and Rank Learning

Siyuan Li, Shikai Fang, Lei Cheng et al.

Functional tensor decomposition can analyze multi-dimensional data with real-valued indices, paving the path for applications in machine learning and signal processing. A limitation of existing approaches is the assumption that the tensor rank-a critical parameter governing model complexity-is known. However, determining the optimal rank is a non-deterministic polynomial-time hard (NP-hard) task and there is a limited understanding regarding the expressive power of functional low-rank tensor models for continuous signals. We propose a rank-revealing functional Bayesian tensor completion (RR-FBTC) method. Modeling the latent functions through carefully designed multioutput Gaussian processes, RR-FBTC handles tensors with real-valued indices while enabling automatic tensor rank determination during the inference process. We establish the universal approximation property of the model for continuous multi-dimensional signals, demonstrating its expressive power in a concise format. To learn this model, we employ the variational inference framework and derive an efficient algorithm with closed-form updates. Experiments on both synthetic and real-world datasets demonstrate the effectiveness and superiority of the RR-FBTC over state-of-the-art approaches. The code is available at https://github.com/OceanSTARLab/RR-FBTC.

MLMay 28, 2022

Rethinking Bayesian Learning for Data Analysis: The Art of Prior and Inference in Sparsity-Aware Modeling

Lei Cheng, Feng Yin, Sergios Theodoridis et al.

Sparse modeling for signal processing and machine learning has been at the focus of scientific research for over two decades. Among others, supervised sparsity-aware learning comprises two major paths paved by: a) discriminative methods and b) generative methods. The latter, more widely known as Bayesian methods, enable uncertainty evaluation w.r.t. the performed predictions. Furthermore, they can better exploit related prior information and naturally introduce robustness into the model, due to their unique capacity to marginalize out uncertainties related to the parameter estimates. Moreover, hyper-parameters associated with the adopted priors can be learnt via the training data. To implement sparsity-aware learning, the crucial point lies in the choice of the function regularizer for discriminative methods and the choice of the prior distribution for Bayesian learning. Over the last decade or so, due to the intense research on deep learning, emphasis has been put on discriminative techniques. However, a come back of Bayesian methods is taking place that sheds new light on the design of deep neural networks, which also establish firm links with Bayesian models and inspire new paths for unsupervised learning, such as Bayesian tensor decomposition. The goal of this article is two-fold. First, to review, in a unified way, some recent advances in incorporating sparsity-promoting priors into three highly popular data modeling tools, namely deep neural networks, Gaussian processes, and tensor decomposition. Second, to review their associated inference techniques from different aspects, including: evidence maximization via optimization and variational inference methods. Challenges such as small data dilemma, automatic model structure search, and natural prediction uncertainty evaluation are also discussed. Typical signal processing and machine learning tasks are demonstrated.

SDJun 1, 2023

Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners

Sarthak Yadav, Sergios Theodoridis, Lars Kai Hansen et al.

In this work, we propose a Multi-Window Masked Autoencoder (MW-MAE) fitted with a novel Multi-Window Multi-Head Attention (MW-MHA) module that facilitates the modelling of local-global interactions in every decoder transformer block through attention heads of several distinct local and global windows. Empirical results on ten downstream audio tasks show that MW-MAEs consistently outperform standard MAEs in overall performance and learn better general-purpose audio representations, along with demonstrating considerably better scaling characteristics. Investigating attention distances and entropies reveals that MW-MAE encoders learn heads with broader local and global attention. Analyzing attention head feature representations through Projection Weighted Canonical Correlation Analysis (PWCCA) shows that attention heads with the same window sizes across the decoder layers of the MW-MAE learn correlated feature representations which enables each block to independently capture local and global information, leading to a decoupled decoder feature hierarchy. Code for feature extraction and downstream experiments along with pre-trained models will be released publically.

LGSep 15, 2023

Sparsity-Aware Distributed Learning for Gaussian Processes with Linear Multiple Kernel

Richard Cornelius Suwandi, Zhidi Lin, Feng Yin et al.

Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyper-parameter optimization. This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyper-parameters. The newly proposed grid spectral mixture product (GSMP) kernel is tailored for multi-dimensional data, effectively reducing the number of hyper-parameters while maintaining good approximation capability. We further demonstrate that the associated hyper-parameter optimization of this kernel yields sparse solutions. To exploit the inherent sparsity of the solutions, we introduce the Sparse LInear Multiple Kernel Learning (SLIM-KL) framework. The framework incorporates a quantized alternating direction method of multipliers (ADMM) scheme for collaborative learning among multiple agents, where the local optimization problem is solved using a distributed successive convex approximation (DSCA) algorithm. SLIM-KL effectively manages large-scale hyper-parameter optimization for the proposed kernel, simultaneously ensuring data privacy and minimizing communication costs. Theoretical analysis establishes convergence guarantees for the learning framework, while experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our proposed methods.

LGSep 22, 2025Code

Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs

Richard Cornelius Suwandi, Feng Yin, Juntao Wang et al.

The efficiency of Bayesian optimization (BO) relies heavily on the choice of the Gaussian process (GP) kernel, which plays a central role in balancing exploration and exploitation under limited evaluation budgets. Traditional BO methods often rely on fixed or heuristic kernel selection strategies, which can result in slow convergence or suboptimal solutions when the chosen kernel is poorly suited to the underlying objective function. To address this limitation, we propose a freshly-baked Context-Aware Kernel Evolution (CAKE) to enhance BO with large language models (LLMs). Concretely, CAKE leverages LLMs as the crossover and mutation operators to adaptively generate and refine GP kernels based on the observed data throughout the optimization process. To maximize the power of CAKE, we further propose BIC-Acquisition Kernel Ranking (BAKER) to select the most effective kernel through balancing the model fit measured by the Bayesian information criterion (BIC) with the expected improvement at each iteration of BO. Extensive experiments demonstrate that our fresh CAKE-based BO method consistently outperforms established baselines across a range of real-world tasks, including hyperparameter optimization, controller tuning, and photonic chip design. Our code is publicly available at https://github.com/richardcsuwandi/cake.

LGSep 3, 2023Code

Towards Efficient Modeling and Inference in Multi-Dimensional Gaussian Process State-Space Models

Zhidi Lin, Juan Maroñas, Ying Li et al.

The Gaussian process state-space model (GPSSM) has attracted extensive attention for modeling complex nonlinear dynamical systems. However, the existing GPSSM employs separate Gaussian processes (GPs) for each latent state dimension, leading to escalating computational complexity and parameter proliferation, thus posing challenges for modeling dynamical systems with high-dimensional latent states. To surmount this obstacle, we propose to integrate the efficient transformed Gaussian process (ETGP) into the GPSSM, which involves pushing a shared GP through multiple normalizing flows to efficiently model the transition function in high-dimensional latent state space. Additionally, we develop a corresponding variational inference algorithm that surpasses existing methods in terms of parameter count and computational complexity. Experimental results on diverse synthetic and real-world datasets corroborate the efficiency of the proposed method, while also demonstrating its ability to achieve similar inference performance compared to existing methods. Code is available at \url{https://github.com/zhidilin/gpssmProj}.

SDSep 23, 2025

An overview of neural architectures for self-supervised audio representation learning from masked spectrograms

Sarthak Yadav, Sergios Theodoridis, Zheng-Hua Tan

In recent years, self-supervised learning has amassed significant interest for training deep neural representations without labeled data. One such self-supervised learning approach is masked spectrogram modeling, where the objective is to learn semantically rich contextual representations by predicting removed or hidden portions of the input audio spectrogram. With the Transformer neural architecture at its core, masked spectrogram modeling has emerged as the prominent approach for learning general purpose audio representations, a.k.a. audio foundation models. Meanwhile, addressing the issues of the Transformer architecture, in particular the underlying Scaled Dot-product Attention operation, which scales quadratically with input sequence length, has led to renewed interest in recurrent sequence modeling approaches. Among them, Selective structured state space models (such as Mamba) and extended Long Short-Term Memory (xLSTM) are the two most promising approaches which have experienced widespread adoption. While the body of work on these two topics continues to grow, there is currently a lack of an adequate overview encompassing the intersection of these topics. In this paper, we present a comprehensive overview of the aforementioned research domains, covering masked spectrogram modeling and the previously mentioned neural sequence modeling architectures, Mamba and xLSTM. Further, we compare Transformers, Mamba and xLSTM based masked spectrogram models in a unified, reproducible framework on ten diverse downstream audio classification tasks, which will help interested readers to make informed decisions regarding suitability of the evaluated approaches to adjacent applications.

SDJul 14, 2025

AudioMAE++: learning better masked audio representations with SwiGLU FFNs

Sarthak Yadav, Sergios Theodoridis, Zheng-Hua Tan

Masked Autoencoders (MAEs) trained on audio spectrogram patches have emerged as a prominent approach for learning self-supervised audio representations. While several recent papers have evaluated key aspects of training MAEs on audio data, the majority of these approaches still leverage vanilla transformer building blocks, whereas the transformer community has seen steady integration of newer architectural advancements. In this work, we propose AudioMAE++, a revamped audio masked autoencoder with two such enhancements, namely macaron-style transformer blocks with gated linear units. When pretrained on the AudioSet dataset, the proposed AudioMAE++ models outperform existing MAE based approaches on 10 diverse downstream tasks, demonstrating excellent performance on audio classification and speech-based benchmarks. The proposed AudioMAE++ models also demonstrate excellent scaling characteristics, outperforming directly comparable standard MAE baselines with up to 4x more parameters.

LGDec 5, 2021

Stochastic Local Winner-Takes-All Networks Enable Profound Adversarial Robustness

Konstantinos P. Panousis, Sotirios Chatzis, Sergios Theodoridis

This work explores the potency of stochastic competition-based activations, namely Stochastic Local Winner-Takes-All (LWTA), against powerful (gradient-based) white-box and black-box adversarial attacks; we especially focus on Adversarial Training settings. In our work, we replace the conventional ReLU-based nonlinearities with blocks comprising locally and stochastically competing linear units. The output of each network layer now yields a sparse output, depending on the outcome of winner sampling in each block. We rely on the Variational Bayesian framework for training and inference; we incorporate conventional PGD-based adversarial training arguments to increase the overall adversarial robustness. As we experimentally show, the arising networks yield state-of-the-art robustness against powerful adversarial attacks while retaining very high classification rate in the benign case.

CLSep 15, 2021

Dialog speech sentiment classification for imbalanced datasets

Sergis Nicolaou, Lambros Mavrides, Georgina Tryfou et al.

Speech is the most common way humans express their feelings, and sentiment analysis is the use of tools such as natural language processing and computational algorithms to identify the polarity of these feelings. Even though this field has seen tremendous advancements in the last two decades, the task of effectively detecting under represented sentiments in different kinds of datasets is still a challenging task. In this paper, we use single and bi-modal analysis of short dialog utterances and gain insights on the main factors that aid in sentiment detection, particularly in the underrepresented classes, in datasets with and without inherent sentiment component. Furthermore, we propose an architecture which uses a learning rate scheduler and different monitoring criteria and provides state-of-the-art results for the SWITCHBOARD imbalanced sentiment dataset.

LGJan 4, 2021

Local Competition and Stochasticity for Adversarial Robustness in Deep Learning

Konstantinos P. Panousis, Sotirios Chatzis, Antonios Alexos et al.

This work addresses adversarial robustness in deep learning by considering deep networks with stochastic local winner-takes-all (LWTA) activations. This type of network units result in sparse representations from each model layer, as the units are organized in blocks where only one unit generates a non-zero output. The main operating principle of the introduced units lies on stochastic arguments, as the network performs posterior sampling over competing units to select the winner. We combine these LWTA arguments with tools from the field of Bayesian non-parametrics, specifically the stick-breaking construction of the Indian Buffet Process, to allow for inferring the sub-part of each layer that is essential for modeling the data at hand. Then, inference is performed by means of stochastic variational Bayes. We perform a thorough experimental evaluation of our model using benchmark datasets. As we show, our method achieves high robustness to adversarial perturbations, with state-of-the-art performance in powerful adversarial attack schemes.

LGSep 5, 2020

Towards Flexible Sparsity-Aware Modeling: Automatic Tensor Rank Learning Using The Generalized Hyperbolic Prior

Lei Cheng, Zhongtao Chen, Qingjiang Shi et al.

Tensor rank learning for canonical polyadic decomposition (CPD) has long been deemed as an essential yet challenging problem. In particular, since the tensor rank controls the complexity of the CPD model, its inaccurate learning would cause overfitting to noise or underfitting to the signal sources, and even destroy the interpretability of model parameters. However, the optimal determination of a tensor rank is known to be a non-deterministic polynomial-time hard (NP-hard) task. Rather than exhaustively searching for the best tensor rank via trial-and-error experiments, Bayesian inference under the Gaussian-gamma prior was introduced in the context of probabilistic CPD modeling, and it was shown to be an effective strategy for automatic tensor rank determination. This triggered flourishing research on other structured tensor CPDs with automatic tensor rank learning. On the other side of the coin, these research works also reveal that the Gaussian-gamma model does not perform well for high-rank tensors and/or low signal-to-noise ratios (SNRs). To overcome these drawbacks, in this paper, we introduce a more advanced generalized hyperbolic (GH) prior to the probabilistic CPD model, which not only includes the Gaussian-gamma model as a special case, but also is more flexible to adapt to different levels of sparsity. Based on this novel probabilistic model, an algorithm is developed under the framework of variational inference, where each update is obtained in a closed-form. Extensive numerical results, using synthetic data and real-world datasets, demonstrate the significantly improved performance of the proposed method in learning both low as well as high tensor ranks even for low SNR cases.

SPMay 12, 2020

Early soft and flexible fusion of EEG and fMRI via tensor decompositions

Christos Chatzichristos, Eleftherios Kofidis, Lieven De Lathauwer et al.

Data fusion refers to the joint analysis of multiple datasets which provide complementary views of the same task. In this preprint, the problem of jointly analyzing electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI) data is considered. Jointly analyzing EEG and fMRI measurements is highly beneficial for studying brain function because these modalities have complementary spatiotemporal resolution: EEG offers good temporal resolution while fMRI is better in its spatial resolution. The fusion methods reported so far ignore the underlying multi-way nature of the data in at least one of the modalities and/or rely on very strong assumptions about the relation of the two datasets. In this preprint, these two points are addressed by adopting for the first time tensor models in the two modalities while also exploring double coupled tensor decompositions and by following soft and flexible coupling approaches to implement the multi-modal analysis. To cope with the Event Related Potential (ERP) variability in EEG, the PARAFAC2 model is adopted. The results obtained are compared against those of parallel Independent Component Analysis (ICA) and hard coupling alternatives in both simulated and real data. Our results confirm the superiority of tensorial methods over methods based on ICA. In scenarios that do not meet the assumptions underlying hard coupling, the advantage of soft and flexible coupled decompositions is clearly demonstrated.

DCMar 8, 2020

FedLoc: Federated Learning Framework for Data-Driven Cooperative Localization and Location Data Processing

Feng Yin, Zhidi Lin, Yue Xu et al.

In this overview paper, data-driven learning model-based cooperative localization and location data processing are considered, in line with the emerging machine learning and big data methods. We first review (1) state-of-the-art algorithms in the context of federated learning, (2) two widely used learning models, namely the deep neural network model and the Gaussian process model, and (3) various distributed model hyper-parameter optimization schemes. Then, we demonstrate various practical use cases that are summarized from a mixture of standard, newly published, and unpublished works, which cover a broad range of location services, including collaborative static localization/fingerprinting, indoor target tracking, outdoor navigation using low-sampling GPS, and spatio-temporal wireless traffic data modeling and prediction. Experimental results show that near centralized data fitting- and prediction performance can be achieved by a set of collaborative mobile users running distributed algorithms. All the surveyed use cases fall under our newly proposed Federated Localization (FedLoc) framework, which targets on collaboratively building accurate location services without sacrificing user privacy, in particular, sensitive information related to their geographical trajectories. Future research directions are also discussed at the end of this paper.

LGFeb 13, 2020

Variational Conditional Dependence Hidden Markov Models for Skeleton-Based Action Recognition

Konstantinos P. Panousis, Sotirios Chatzis, Sergios Theodoridis

Hidden Markov Models (HMMs) comprise a powerful generative approach for modeling sequential data and time-series in general. However, the commonly employed assumption of the dependence of the current time frame to a single or multiple immediately preceding frames is unrealistic; more complicated dynamics potentially exist in real world scenarios. This paper revisits conventional sequential modeling approaches, aiming to address the problem of capturing time-varying temporal dependency patterns. To this end, we propose a different formulation of HMMs, whereby the dependence on past frames is dynamically inferred from the data. Specifically, we introduce a hierarchical extension by postulating an additional latent variable layer; therein, the (time-varying) temporal dependence patterns are treated as latent variables over which inference is performed. We leverage solid arguments from the Variational Bayes framework and derive a tractable inference algorithm based on the forward-backward algorithm. As we experimentally show, our approach can model highly complex sequential data and can effectively handle data with missing values.

LGApr 21, 2019

Linear Multiple Low-Rank Kernel Based Stationary Gaussian Processes Regression for Time Series

Feng Yin, Lishuo Pan, Xinwei He et al.

Gaussian processes (GP) for machine learning have been studied systematically over the past two decades and they are by now widely used in a number of diverse applications. However, GP kernel design and the associated hyper-parameter optimization are still hard and to a large extend open problems. In this paper, we consider the task of GP regression for time series modeling and analysis. The underlying stationary kernel can be approximated arbitrarily close by a new proposed grid spectral mixture (GSM) kernel, which turns out to be a linear combination of low-rank sub-kernels. In the case where a large number of the sub-kernels are used, either the Nyström or the random Fourier feature approximations can be adopted to deal efficiently with the computational demands. The unknown GP hyper-parameters consist of the non-negative weights of all sub-kernels as well as the noise variance; their estimation is performed via the maximum-likelihood (ML) estimation framework. Two efficient numerical optimization methods for solving the unknown hyper-parameters are derived, including a sequential majorization-minimization (MM) method and a non-linearly constrained alternating direction of multiplier method (ADMM). The MM matches perfectly with the proven low-rank property of the proposed GSM sub-kernels and turns out to be a part of efficiency, stable, and efficient solver, while the ADMM has the potential to generate better local minimum in terms of the test MSE. Experimental results, based on various classic time series data sets, corroborate that the proposed GSM kernel-based GP regression model outperforms several salient competitors of similar kind in terms of prediction mean-squared-error and numerical stability.

LGAug 1, 2018

Compressible Spectral Mixture Kernels with Sparse Dependency Structures for Gaussian Processes

Kai Chen, Yijue Dai, Feng Yin et al.

Spectral mixture (SM) kernels comprise a powerful class of generalized kernels for Gaussian processes (GPs) to describe complex patterns. This paper introduces model compression and time- and phase (TP) modulated dependency structures to the original (SM) kernel for improved generalization of GPs. Specifically, by adopting Bienaymés identity, we generalize the dependency structure through cross-covariance between the SM components. Then, we propose a novel SM kernel with a dependency structure (SMD) by using cross-convolution between the SM components. Furthermore, we ameliorate the expressiveness of the dependency structure by parameterizing it with time and phase delays. The dependency structure has clear interpretations in terms of spectral density, covariance behavior, and sampling path. To enrich the SMD with effective hyperparameter initialization, compressible SM kernel components, and sparse dependency structures, we introduce a novel structure adaptation (SA) algorithm in the end. A thorough comparative analysis of the SMD on both synthetic and real-life applications corroborates its efficacy.

LGMay 19, 2018

Nonparametric Bayesian Deep Networks with Local Competition

Konstantinos P. Panousis, Sotirios Chatzis, Sergios Theodoridis

The aim of this work is to enable inference of deep networks that retain high accuracy for the least possible model complexity, with the latter deduced from the data during inference. To this end, we revisit deep networks that comprise competing linear units, as opposed to nonlinear units that do not entail any form of (local) competition. In this context, our main technical innovation consists in an inferential setup that leverages solid arguments from Bayesian nonparametrics. We infer both the needed set of connections or locally competing sets of units, as well as the required floating-point precision for storing the network parameters. Specifically, we introduce auxiliary discrete latent variables representing which initial network components are actually needed for modeling the data at hand, and perform Bayesian inference over them by imposing appropriate stick-breaking priors. As we experimentally show using benchmark datasets, our approach yields networks with less computational footprint than the state-of-the-art, and with no compromises in predictive accuracy.

MLApr 20, 2018

Unsupervised learning of the brain connectivity dynamic using residual D-net

Youngjoo Seo, Manuel Morante, Yannis Kopsinis et al.

In this paper, we propose a novel unsupervised learning method to learn the brain dynamics using a deep learning architecture named residual D-net. As it is often the case in medical research, in contrast to typical deep learning tasks, the size of the resting-state functional Magnetic Resonance Image (rs-fMRI) datasets for training is limited. Thus, the available data should be very efficiently used to learn the complex patterns underneath the brain connectivity dynamics. To address this issue, we use residual connections to alleviate the training complexity through recurrent multi-scale representation. We conduct two classification tasks to differentiate early and late stage Mild Cognitive Impairment (MCI) from Normal healthy Control (NC) subjects. The experiments verify that our proposed residual D-net indeed learns the brain connectivity dynamics, leading to significantly higher classification accuracy compared to previously published techniques.

MLFeb 5, 2018

Information Assisted Dictionary Learning for fMRI data analysis

Manuel Morante, Yannis Kopsinis, Sergios Theodoridis et al.

In this paper, the task-related fMRI problem is treated in its matrix factorization formulation, focused on the Dictionary Learning (DL) approach. The new method allows the incorporation of a priori knowledge associated both with the experimental design as well as with available brain Atlases. Moreover, the proposed method can efficiently cope with uncertainties related to the HRF modeling. In addition, the proposed method bypasses one of the major drawbacks that are associated with DL methods; that is, the selection of the sparsity-related regularization parameters. In our formulation, an alternative sparsity promoting constraint is employed, that bears a direct relation to the number of voxels in the spatial maps. Hence, the related parameters can be tuned using information that is available from brain atlases. The proposed method is evaluated against several other popular techniques, including GLM. The obtained performance gains are reported via a novel realistic synthetic fMRI dataset as well as real data that are related to a challenging experimental design.

LGMar 23, 2017

Online Distributed Learning Over Networks in RKH Spaces Using Random Fourier Features

Pantelis Bouboulis, Symeon Chouvardas, Sergios Theodoridis

We present a novel diffusion scheme for online kernel-based learning over networks. So far, a major drawback of any online learning algorithm, operating in a reproducing kernel Hilbert space (RKHS), is the need for updating a growing number of parameters as time iterations evolve. Besides complexity, this leads to an increased need of communication resources, in a distributed setting. In contrast, the proposed method approximates the solution as a fixed-size vector (of larger dimension than the input space) using Random Fourier Features. This paves the way to use standard linear combine-then-adapt techniques. To the best of our knowledge, this is the first time that a complete protocol for distributed online learning in RKHS is presented. Conditions for asymptotic convergence and boundness of the networkwise regret are also provided. The simulated tests illustrate the performance of the proposed scheme.

MLOct 11, 2016

Assisted Dictionary Learning for fMRI Data Analysis

Manuel Morante Moreno, Yannis Kopsinis, Eleftherios Kofidis et al.

Extracting information from functional magnetic resonance (fMRI) images has been a major area of research for more than two decades. The goal of this work is to present a new method for the analysis of fMRI data sets, that is capable to incorporate a priori available information, via an efficient optimization framework. Tests on synthetic data sets demonstrate significant performance gains over existing methods of this kind.

NAJul 15, 2016

Higher-Order Block Term Decomposition for Spatially Folded fMRI Data

Christos Chatzichristos, Eleftherios Kofidis, Giannis Kopsinis et al.

The growing use of neuroimaging technologies generates a massive amount of biomedical data that exhibit high dimensionality. Tensor-based analysis of brain imaging data has been proved quite effective in exploiting their multiway nature. The advantages of tensorial methods over matrix-based approaches have also been demonstrated in the characterization of functional magnetic resonance imaging (fMRI) data, where the spatial (voxel) dimensions are commonly grouped (unfolded) as a single way/mode of the 3-rd order array, the other two ways corresponding to time and subjects. However, such methods are known to be ineffective in more demanding scenarios, such as the ones with strong noise and/or significant overlapping of activated regions. This paper aims at investigating the possible gains from a better exploitation of the spatial dimension, through a higher- (4 or 5) order tensor modeling of the fMRI signal. In this context, and in order to increase the degrees of freedom of the modeling process, a higher-order Block Term Decomposition (BTD) is applied, for the first time in fMRI analysis. Its effectiveness is demonstrated via extensive simulation results.

LGJun 12, 2016

Efficient KLMS and KRLS Algorithms: A Random Fourier Feature Perspective

Pantelis Bouboulis, Spyridon Pougkakiotis, Sergios Theodoridis

We present a new framework for online Least Squares algorithms for nonlinear modeling in RKH spaces (RKHS). Instead of implicitly mapping the data to a RKHS (e.g., kernel trick), we map the data to a finite dimensional Euclidean space, using random features of the kernel's Fourier transform. The advantage is that, the inner product of the mapped data approximates the kernel function. The resulting "linear" algorithm does not require any form of sparsification, since, in contrast to all existing algorithms, the solution's size remains fixed and does not increase with the iteration steps. As a result, the obtained algorithms are computationally significantly more efficient compared to previously derived variants, while, at the same time, they converge at similar speeds and to similar error floors.

LGJan 4, 2016

Robust Non-linear Regression: A Greedy Approach Employing Kernels with Application to Image Denoising

George Papageorgiou, Pantelis Bouboulis, Sergios Theodoridis

We consider the task of robust non-linear regression in the presence of both inlier noise and outliers. Assuming that the unknown non-linear function belongs to a Reproducing Kernel Hilbert Space (RKHS), our goal is to estimate the set of the associated unknown parameters. Due to the presence of outliers, common techniques such as the Kernel Ridge Regression (KRR) or the Support Vector Regression (SVR) turn out to be inadequate. Instead, we employ sparse modeling arguments to explicitly model and estimate the outliers, adopting a greedy approach. The proposed robust scheme, i.e., Kernel Greedy Algorithm for Robust Denoising (KGARD), is inspired by the classical Orthogonal Matching Pursuit (OMP) algorithm. Specifically, the proposed method alternates between a KRR task and an OMP-like selection step. Theoretical results concerning the identification of the outliers are provided. Moreover, KGARD is compared against other cutting edge methods, where its performance is evaluated via a set of experiments with various types of noise. Finally, the proposed robust estimation framework is applied to the task of image denoising, and its enhanced performance in the presence of outliers is demonstrated.

LGMar 9, 2013

Complex Support Vector Machines for Regression and Quaternary Classification

Pantelis Bouboulis, Sergios Theodoridis, Charalampos Mavroforakis et al.

The paper presents a new framework for complex Support Vector Regression as well as Support Vector Machines for quaternary classification. The method exploits the notion of widely linear estimation to model the input-out relation for complex-valued data and considers two cases: a) the complex data are split into their real and imaginary parts and a typical real kernel is employed to map the complex data to a complexified feature space and b) a pure complex kernel is used to directly map the data to the induced complex feature space. The recently developed Wirtinger's calculus on complex reproducing kernel Hilbert spaces (RKHS) is employed in order to compute the Lagrangian and derive the dual optimization problem. As one of our major results, we prove that any complex SVM/SVR task is equivalent with solving two real SVM/SVR tasks exploiting a specific real kernel which is generated by the chosen complex kernel. In particular, the case of pure complex kernels leads to the generation of new kernels, which have not been considered before. In the classification case, the proposed framework inherently splits the complex space into four parts. This leads naturally in solving the four class-task (quaternary classification), instead of the typical two classes of the real SVM. In turn, this rationale can be used in a multiclass problem as a split-class scenario based on four classes, as opposed to the one-versus-all method; this can lead to significant computational savings. Experiments demonstrate the effectiveness of the proposed framework for regression and classification tasks that involve complex data.