Michael Muma

SP
h-index19
15papers
98citations
Novelty53%
AI Score48

15 Papers

LGApr 1, 2022
Robust and Efficient Aggregation for Distributed Learning

Stefan Vlaski, Christian Schroth, Michael Muma et al.

Distributed learning paradigms, such as federated and decentralized learning, allow for the coordination of models across a collection of agents, and without the need to exchange raw data. Instead, agents compute model updates locally based on their available data, and subsequently share the update model with a parameter server or their peers. This is followed by an aggregation step, which traditionally takes the form of a (weighted) average. Distributed learning schemes based on averaging are known to be susceptible to outliers. A single malicious agent is able to drive an averaging-based distributed learning algorithm to an arbitrarily poor model. This has motivated the development of robust aggregation schemes, which are based on variations of the median and trimmed mean. While such procedures ensure robustness to outliers and malicious behavior, they come at the cost of significantly reduced sample efficiency. This means that current robust aggregation schemes require significantly higher agent participation rates to achieve a given level of performance than their mean-based counterparts in non-contaminated settings. In this work we remedy this drawback by developing statistically efficient and robust aggregation schemes for distributed learning.

SPDec 14, 2022
Shuffled Multi-Channel Sparse Signal Recovery

Taulant Koka, Manolis C. Tsakiris, Michael Muma et al.

Mismatches between samples and their respective channel or target commonly arise in several real-world applications. For instance, whole-brain calcium imaging of freely moving organisms, multiple-target tracking or multi-person contactless vital sign monitoring may be severely affected by mismatched sample-channel assignments. To systematically address this fundamental problem, we pose it as a signal reconstruction problem where we have lost correspondences between the samples and their respective channels. Assuming that we have a sensing matrix for the underlying signals, we show that the problem is equivalent to a structured unlabeled sensing problem, and establish sufficient conditions for unique recovery. To the best of our knowledge, a sampling result for the reconstruction of shuffled multi-channel signals has not been considered in the literature and existing methods for unlabeled sensing cannot be directly applied. We extend our results to the case where the signals admit a sparse representation in an overcomplete dictionary (i.e., the sensing matrix is not precisely known), and derive sufficient conditions for the reconstruction of shuffled sparse signals. We propose a robust reconstruction method that combines sparse signal recovery with robust linear regression for the two-channel case. The performance and robustness of the proposed approach is illustrated in an application related to whole-brain calcium imaging. The proposed methodology can be generalized to sparse signal representations other than the ones considered in this work to be applied in a variety of real-world problems with imprecise measurement or channel assignment.

MEFeb 5
Learning False Discovery Rate Control via Model-Based Neural Networks

Arnau Vilella, Jasin Machkour, Michael Muma et al.

Controlling the false discovery rate (FDR) in high-dimensional variable selection requires balancing rigorous error control with statistical power. Existing methods with provable guarantees are often overly conservative, creating a persistent gap between the realized false discovery proportion (FDP) and the target FDR level. We introduce a learning-augmented enhancement of the T-Rex Selector framework that narrows this gap. Our approach replaces the analytical FDP estimator with a neural network trained solely on diverse synthetic datasets, enabling a substantially tighter and more accurate approximation of the FDP. This refinement allows the procedure to operate much closer to the desired FDR level, thereby increasing discovery power while maintaining effective approximate control. Through extensive simulations and a challenging synthetic genome-wide association study (GWAS), we demonstrate that our method achieves superior detection of true variables compared to existing approaches.

PMJan 26, 2024Code
FDR-Controlled Portfolio Optimization for Sparse Financial Index Tracking

Jasin Machkour, Daniel P. Palomar, Michael Muma

In high-dimensional data analysis, such as financial index tracking or biomedical applications, it is crucial to select the few relevant variables while maintaining control over the false discovery rate (FDR). In these applications, strong dependencies often exist among the variables (e.g., stock returns), which can undermine the FDR control property of existing methods like the model-X knockoff method or the T-Rex selector. To address this issue, we have expanded the T-Rex framework to accommodate overlapping groups of highly correlated variables. This is achieved by integrating a nearest neighbors penalization mechanism into the framework, which provably controls the FDR at the user-defined target level. A real-world example of sparse index tracking demonstrates the proposed method's ability to accurately track the S&P 500 index over the past 20 years based on a small number of stocks. An open-source implementation is provided within the R package TRexSelector on CRAN.

MEOct 12, 2021Code
The Terminating-Random Experiments Selector: Fast High-Dimensional Variable Selection with False Discovery Rate Control

Jasin Machkour, Michael Muma, Daniel P. Palomar

We propose the Terminating-Random Experiments (T-Rex) selector, a fast variable selection method for high-dimensional data. The T-Rex selector controls a user-defined target false discovery rate (FDR) while maximizing the number of selected variables. This is achieved by fusing the solutions of multiple early terminated random experiments. The experiments are conducted on a combination of the original predictors and multiple sets of randomly generated dummy predictors. A finite sample proof based on martingale theory for the FDR control property is provided. Numerical simulations confirm that the FDR is controlled at the target level while allowing for high power. We prove that the dummies can be sampled from any univariate probability distribution with finite expectation and variance. The computational complexity of the proposed method is linear in the number of variables. The T-Rex selector outperforms state-of-the-art methods for FDR control in numerical experiments and on a simulated genome-wide association study (GWAS), while its sequential computation time is more than two orders of magnitude lower than that of the strongest benchmark methods. The open source R package TRexSelector containing the implementation of the T-Rex selector is available on CRAN.

HCAug 14, 2025
Reproducible Physiological Features in Affective Computing: A Preliminary Analysis on Arousal Modeling

Andrea Gargano, Jasin Machkour, Mimma Nardelli et al.

In Affective Computing, a key challenge lies in reliably linking subjective emotional experiences with objective physiological markers. This preliminary study addresses the issue of reproducibility by identifying physiological features from cardiovascular and electrodermal signals that are associated with continuous self-reports of arousal levels. Using the Continuously Annotated Signal of Emotion dataset, we analyzed 164 features extracted from cardiac and electrodermal signals of 30 participants exposed to short emotion-evoking videos. Feature selection was performed using the Terminating-Random Experiments (T-Rex) method, which performs variable selection systematically controlling a user-defined target False Discovery Rate. Remarkably, among all candidate features, only two electrodermal-derived features exhibited reproducible and statistically significant associations with arousal, achieving a 100\% confirmation rate. These results highlight the necessity of rigorous reproducibility assessments in physiological features selection, an aspect often overlooked in Affective Computing. Our approach is particularly promising for applications in safety-critical environments requiring trustworthy and reliable white box models, such as mental disorder recognition and human-robot interaction systems.

SPJun 11, 2025
Cross-Channel Unlabeled Sensing over a Union of Signal Subspaces

Taulant Koka, Manolis C. Tsakiris, Benjamín Béjar Haro et al.

Cross-channel unlabeled sensing addresses the problem of recovering a multi-channel signal from measurements that were shuffled across channels. This work expands the cross-channel unlabeled sensing framework to signals that lie in a union of subspaces. The extension allows for handling more complex signal structures and broadens the framework to tasks like compressed sensing. These mismatches between samples and channels often arise in applications such as whole-brain calcium imaging of freely moving organisms or multi-target tracking. We improve over previous models by deriving tighter bounds on the required number of samples for unique reconstruction, while supporting more general signal types. The approach is validated through an application in whole-brain calcium imaging, where organism movements disrupt sample-to-neuron mappings. This demonstrates the utility of our framework in real-world settings with imprecise sample-channel associations, achieving accurate signal reconstruction.

MLJan 18, 2024
False Discovery Rate Control for Gaussian Graphical Models via Neighborhood Screening

Taulant Koka, Jasin Machkour, Michael Muma

Gaussian graphical models emerge in a wide range of fields. They model the statistical relationships between variables as a graph, where an edge between two variables indicates conditional dependence. Unfortunately, well-established estimators, such as the graphical lasso or neighborhood selection, are known to be susceptible to a high prevalence of false edge detections. False detections may encourage inaccurate or even incorrect scientific interpretations, with major implications in applications, such as biomedicine or healthcare. In this paper, we introduce a nodewise variable selection approach to graph learning and provably control the false discovery rate of the selected edge set at a self-estimated level. A novel fusion method of the individual neighborhoods outputs an undirected graph estimate. The proposed method is parameter-free and does not require tuning by the user. Benchmarks against competing false discovery rate controlling methods in numerical experiments considering different graph topologies show a significant gain in performance.

MLJan 16, 2024
Sparse PCA with False Discovery Rate Controlled Variable Selection

Jasin Machkour, Arnaud Breloy, Michael Muma et al.

Sparse principal component analysis (PCA) aims at mapping large dimensional data to a linear subspace of lower dimension. By imposing loading vectors to be sparse, it performs the double duty of dimension reduction and variable selection. Sparse PCA algorithms are usually expressed as a trade-off between explained variance and sparsity of the loading vectors (i.e., number of selected variables). As a high explained variance is not necessarily synonymous with relevant information, these methods are prone to select irrelevant variables. To overcome this issue, we propose an alternative formulation of sparse PCA driven by the false discovery rate (FDR). We then leverage the Terminating-Random Experiments (T-Rex) selector to automatically determine an FDR-controlled support of the loading vectors. A major advantage of the resulting T-Rex PCA is that no sparsity parameter tuning is required. Numerical experiments and a stock market data example demonstrate a significant performance improvement.

SPJul 26, 2021
Robust Regularized Locality Preserving Indexing for Fiedler Vector Estimation

Aylin Tastan, Michael Muma, Abdelhak M. Zoubir

The Fiedler vector of a connected graph is the eigenvector associated with the algebraic connectivity of the graph Laplacian and it provides substantial information to learn the latent structure of a graph. In real-world applications, however, the data may be subject to heavy-tailed noise and outliers which results in deteriorations in the structure of the Fiedler vector estimate. We design a Robust Regularized Locality Preserving Indexing (RRLPI) method for Fiedler vector estimation that aims to approximate the nonlinear manifold structure of the Laplace Beltrami operator while minimizing the negative impact of outliers. First, an analysis of the effects of two fundamental outlier types on the eigen-decomposition for block affinity matrices which are essential in cluster analysis is conducted. Then, an error model is formulated and a robust Fiedler vector estimation algorithm is developed. An unsupervised penalty parameter selection algorithm is proposed that leverages the geometric structure of the projection space to perform robust regularized Fiedler estimation. The performance of RRLPI is benchmarked against existing competitors in terms of detection probability, partitioning quality, image segmentation capability, robustness and computation time using a large variety of synthetic and real data experiments.

SPJun 30, 2020
Real Elliptically Skewed Distributions and Their Application to Robust Cluster Analysis

Christian A. Schroth, Michael Muma

This article proposes a new class of Real Elliptically Skewed (RESK) distributions and associated clustering algorithms that allow for integrating robustness and skewness into a single unified cluster analysis framework. Non-symmetrically distributed and heavy-tailed data clusters have been reported in a variety of real-world applications. Robustness is essential because a few outlying observations can severely obscure the cluster structure. The RESK distributions are a generalization of the Real Elliptically Symmetric (RES) distributions. To estimate the cluster parameters and memberships, we derive an expectation maximization (EM) algorithm for arbitrary RESK distributions. Special attention is given to a new robust skew-Huber M-estimator, which is also the maximum likelihood estimator (MLE) for the skew-Huber distribution that belongs to the RESK class. Numerical experiments on simulated and real-world data confirm the usefulness of the proposed methods for skewed and heavy-tailed data sets.

SPMay 4, 2020
Robust M-Estimation Based Bayesian Cluster Enumeration for Real Elliptically Symmetric Distributions

Christian A. Schroth, Michael Muma

Robustly determining the optimal number of clusters in a data set is an essential factor in a wide range of applications. Cluster enumeration becomes challenging when the true underlying structure in the observed data is corrupted by heavy-tailed noise and outliers. Recently, Bayesian cluster enumeration criteria have been derived by formulating cluster enumeration as maximization of the posterior probability of candidate models. This article generalizes robust Bayesian cluster enumeration so that it can be used with any arbitrary Real Elliptically Symmetric (RES) distributed mixture model. Our framework also covers the case of M-estimators that allow for mixture models, which are decoupled from a specific probability distribution. Examples of Huber's and Tukey's M-estimators are discussed. We derive a robust criterion for data sets with finite sample size, and also provide an asymptotic approximation to reduce the computational cost at large sample sizes. The algorithms are applied to simulated and real-world data sets, including radar-based person identification, and show a significant robustness improvement in comparison to existing methods.

MLNov 29, 2018
Robust Bayesian Cluster Enumeration Based on the $t$ Distribution

Freweyni K. Teklehaymanot, Michael Muma, Abdelhak M. Zoubir

A major challenge in cluster analysis is that the number of data clusters is mostly unknown and it must be estimated prior to clustering the observed data. In real-world applications, the observed data is often subject to heavy tailed noise and outliers which obscure the true underlying structure of the data. Consequently, estimating the number of clusters becomes challenging. To this end, we derive a robust cluster enumeration criterion by formulating the problem of estimating the number of clusters as maximization of the posterior probability of multivariate $t_ν$ distributed candidate models. We utilize Bayes' theorem and asymptotic approximations to come up with a robust criterion that possesses a closed-form expression. Further, we refine the derivation and provide a robust cluster enumeration criterion for data sets with finite sample size. The robust criteria require an estimate of cluster parameters for each candidate model as an input. Hence, we propose a two-step cluster enumeration algorithm that uses the expectation maximization algorithm to partition the data and estimate cluster parameters prior to the calculation of one of the robust criteria. The performance of the proposed algorithm is tested and compared to existing cluster enumeration methods using numerical and real data experiments.

STOct 22, 2017
Bayesian Cluster Enumeration Criterion for Unsupervised Learning

Freweyni K. Teklehaymanot, Michael Muma, Abdelhak M. Zoubir

We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as a starting point when deriving the BIC for specific distributions. Along this line, we provide a closed-form BIC expression for multivariate Gaussian distributed variables. We show that incorporating the data structure of the clustering problem into the derivation of the BIC results in an expression whose penalty term is different from that of the original BIC. We propose a two-step cluster enumeration algorithm. First, a model-based unsupervised learning algorithm partitions the data according to a given set of candidate models. Subsequently, the number of clusters is determined as the one associated with the model for which the proposed BIC is maximal. The performance of the proposed two-step algorithm is tested using synthetic and real data sets.

DCAug 31, 2017
Gravitational Clustering: A Simple, Robust and Adaptive Approach for Distributed Networks

Patricia Binder, Michael Muma, Abdelhak M. Zoubir

Distributed signal processing for wireless sensor networks enables that different devices cooperate to solve different signal processing tasks. A crucial first step is to answer the question: who observes what? Recently, several distributed algorithms have been proposed, which frame the signal/object labelling problem in terms of cluster analysis after extracting source-specific features, however, the number of clusters is assumed to be known. We propose a new method called Gravitational Clustering (GC) to adaptively estimate the time-varying number of clusters based on a set of feature vectors. The key idea is to exploit the physical principle of gravitational force between mass units: streaming-in feature vectors are considered as mass units of fixed position in the feature space, around which mobile mass units are injected at each time instant. The cluster enumeration exploits the fact that the highest attraction on the mobile mass units is exerted by regions with a high density of feature vectors, i.e., gravitational clusters. By sharing estimates among neighboring nodes via a diffusion-adaptation scheme, cooperative and distributed cluster enumeration is achieved. Numerical experiments concerning robustness against outliers, convergence and computational complexity are conducted. The application in a distributed cooperative multi-view camera network illustrates the applicability to real-world problems.