Nicolas Gillis

h-index15

86papers

2,478citations

Novelty49%

AI Score57

Ranked #17,859 of 201,326 authors (top 9%)#3,988 in LG (top 9%)

86 Papers

86.0OCMay 27

Manifold-based Algorithms for the Hadamard Decomposition

Nicolas Gillis, Subhayan Saha, Stefano Sicilia et al.

Given a matrix $X$, and two ranks $r_1$ and $r_2$, the Hadamard decomposition (HD) looks for two low-rank matrices, $X_1$ of rank $r_1$ and $X_2$ of rank $r_2$, both of the same size as $X$, such that $X\approx X_1\circ X_2$, where $\circ$ is the Hadamard (element-wise) product. In most cases, HD is more expressive than standard low-rank approximations such as the truncated singular value decomposition (TSVD), as it can represent higher-rank matrices with the same number of parameters; this is because the rank of $X_1 \circ X_2$ is generically equal to $r_1 r_2$. In this paper, we first present some theoretical insights for HD, in particular a useful reformulation $X\approx WH^\top$ where $W$ and $H$ have $r_1 r_2$ columns and belong to certain manifolds. These allow us to develop three new algorithms for computing HD. The first one uses the representation $X\approx X_1\circ X_2$ and relies on the Manopt toolbox. The other two rely on the reformulation $X\approx WH^\top$: one is a block projected gradient method, and the other is a manifold-based gradient descent algorithm that does not require projection onto the feasible set. The last two algorithms are particularly effective for handling large sparse data. We also propose new initializations that allow us to improve the accuracy of the HD. We compare our algorithms and initialization strategies with the TSVD and with the state of the art. Numerical results show that the new methods are efficient and competitive on both synthetic and real data.

OCJul 25, 2011

Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard

Nicolas Gillis, François Glineur

Weighted low-rank approximation (WLRA), a dimensionality reduction technique for data analysis, has been successfully used in several applications, such as in collaborative filtering to design recommender systems or in computer vision to recover structure from motion. In this paper, we study the computational complexity of WLRA and prove that it is NP-hard to find an approximate solution, even when a rank-one approximation is sought. Our proofs are based on a reduction from the maximum-edge biclique problem, and apply to strictly positive weights as well as binary weights (the latter corresponding to low-rank matrix approximation with missing data).

NAOct 23, 2008

Nonnegative Factorization and The Maximum Edge Biclique Problem

Nicolas Gillis, François Glineur

Nonnegative Matrix Factorization (NMF) is a data analysis technique which allows compression and interpretation of nonnegative data. NMF became widely studied after the publication of the seminal paper by Lee and Seung (Learning the Parts of Objects by Nonnegative Matrix Factorization, Nature, 1999, vol. 401, pp. 788--791), which introduced an algorithm based on Multiplicative Updates (MU). More recently, another class of methods called Hierarchical Alternating Least Squares (HALS) was introduced that seems to be much more efficient in practice. In this paper, we consider the problem of approximating a not necessarily nonnegative matrix with the product of two nonnegative matrices, which we refer to as Nonnegative Factorization (NF); this is the subproblem that HALS methods implicitly try to solve at each iteration. We prove that NF is NP-hard for any fixed factorization rank, using a reduction to the maximum edge biclique problem. We also generalize the multiplicative updates to NF, which allows us to shed some light on the differences between the MU and HALS algorithms for NMF and give an explanation for the better performance of HALS. Finally, we link stationary points of NF with feasible solutions of the biclique problem to obtain a new type of biclique finding algorithm (based on MU) whose iterations have an algorithmic complexity proportional to the number of edges in the graph, and show that it performs better than comparable existing methods.

NAApr 11, 2017

Computing nearest stable matrix pairs

Nicolas Gillis, Volker Mehrmann, Punit Sharma

In this paper, we study the nearest stable matrix pair problem: given a square matrix pair $(E,A)$, minimize the Frobenius norm of $(Δ_E,Δ_A)$ such that $(E+Δ_E,A+Δ_A)$ is a stable matrix pair. We propose a reformulation of the problem with a simpler feasible set by introducing dissipative Hamiltonian (DH) matrix pairs: A matrix pair $(E,A)$ is DH if $A=(J-R)Q$ with skew-symmetric $J$, positive semidefinite $R$, and an invertible $Q$ such that $Q^TE$ is positive semidefinite. This reformulation has a convex feasible domain onto which it is easy to project. This allows us to employ a fast gradient method to obtain a nearby stable approximation of a given matrix pair.

17.5OCMay 26

Computing cone-constrained singular values of matrices

Giovanni Barbarino, Nicolas Gillis, David Sossa

This paper deals with the numerical computation of the least singular value of a rectangular matrix $A$ relative to a pair of closed convex cones $(P,Q)$, which is defined as the optimal value of the non-convex optimization problem of minimizing $\langle u,Av\rangle$ such that $u$ and $v$ are unit vectors in $P$ and $Q$, respectively. When $A$ is the identity matrix, the least singular value coincides with the cosine of the largest angle between $P$ and $Q$. When $P$ and $Q$ are positive orthants, the least singular value is called the least Pareto singular value of $A$ and has applications, for instance, in graph theory. We prove the NP-hardness of all the above problems, while identifying cases when such problems can be solved in polynomial time. We then propose four algorithms. Two are exact algorithms, meaning that they are guaranteed to compute a globally optimal solution; one uses an exact non-convex quadratic programming solver, and the other a brute-force active-set method. The other two are heuristics, meaning that they rapidly compute locally optimal solutions; one uses an alternating projection algorithm with extrapolation, and the other a sequential partial linearization approach based on fractional programming. We illustrate the use of these algorithms on several examples.

MLMay 31, 2013

Robustness Analysis of Hottopixx, a Linear Programming Model for Factoring Nonnegative Matrices

Nicolas Gillis

Although nonnegative matrix factorization (NMF) is NP-hard in general, it has been shown very recently that it is tractable under the assumption that the input nonnegative data matrix is close to being separable (separability requires that all columns of the input matrix belongs to the cone spanned by a small subset of these columns). Since then, several algorithms have been designed to handle this subclass of NMF problems. In particular, Bittorf, Recht, Ré and Tropp (`Factoring nonnegative matrices with linear programs', NIPS 2012) proposed a linear programming model, referred to as Hottopixx. In this paper, we provide a new and more general robustness analysis of their method. In particular, we design a provably more robust variant using a post-processing strategy which allows us to deal with duplicates and near duplicates in the dataset.

OCNov 21, 2017

Finding the nearest positive-real system

Nicolas Gillis, Punit Sharma

The notion of positive realness for linear time-invariant (LTI) dynamical systems, equivalent to passivity, is one of the oldest in system and control theory. In this paper, we consider the problem of finding the nearest positive-real (PR) system to a non PR system: given an LTI control system defined by $E \dot{x}=Ax+Bu$ and $y=Cx+Du$, minimize the Frobenius norm of $(Δ_E,Δ_A,Δ_B,Δ_C,Δ_D)$ such that $(E+Δ_E,A+Δ_A,B+Δ_B,C+Δ_C,D+Δ_D)$ is a PR system. We first show that a system is extended strictly PR if and only if it can be written as a strict port-Hamiltonian system. This allows us to reformulate the nearest PR system problem into an optimization problem with a simple convex feasible set. We then use a fast gradient method to obtain a nearby PR system to a given non PR system, and illustrate the behavior of our algorithm on several examples. This is, to the best of our knowledge, the first algorithm that computes a nearby PR system to a given non PR system that (i) is not based on the spectral properties of related Hamiltonian matrices or pencils, (ii) allows to perturb all matrices $(E,A,B,C,D)$ describing the system, and (iii) does not make any assumption on the original given system.

LGJun 21, 2022

A consistent and flexible framework for deep matrix factorizations

Pierre De Handschutter, Nicolas Gillis

Deep matrix factorizations (deep MFs) are recent unsupervised data mining techniques inspired by constrained low-rank approximations. They aim to extract complex hierarchies of features within high-dimensional datasets. Most of the loss functions proposed in the literature to evaluate the quality of deep MF models and the underlying optimization frameworks are not consistent because different losses are used at different layers. In this paper, we introduce two meaningful loss functions for deep MF and present a generic framework to solve the corresponding optimization problems. We illustrate the effectiveness of this approach through the integration of various constraints and regularizations, such as sparsity, nonnegativity and minimum-volume. The models are successfully applied on both synthetic and real data, namely for hyperspectral unmixing and extraction of facial features.

OCMar 10, 2019

Approximating the nearest stable discrete-time system

Nicolas Gillis, Michael Karow, Punit Sharma

In this paper, we consider the problem of stabilizing discrete-time linear systems by computing a nearby stable matrix to an unstable one. To do so, we provide a new characterization for the set of stable matrices. We show that a matrix $A$ is stable if and only if it can be written as $A=S^{-1}UBS$, where $S$ is positive definite, $U$ is orthogonal, and $B$ is a positive semidefinite contraction (that is, the singular values of $B$ are less or equal to 1). This characterization results in an equivalent non-convex optimization problem with a feasible set on which it is easy to project. We propose a very efficient fast projected gradient method to tackle the problem in variables $(S,U,B)$ and generate locally optimal solutions. We show the effectiveness of the proposed method compared to other approaches.

OCJul 12, 2018

A note on approximating the nearest stable discrete-time descriptor system with fixed rank

Nicolas Gillis, Michael Karow, Punit Sharma

Consider a discrete-time linear time-invariant descriptor system $Ex(k+1)=Ax(k)$ for $k \in \mathbb Z_{+}$. In this paper, we tackle for the first time the problem of stabilizing such systems by computing a nearby regular index one stable system $\hat E x(k+1)= \hat A x(k)$ with $\text{rank}(\hat E)=r$. We reformulate this highly nonconvex problem into an equivalent optimization problem with a relatively simple feasible set onto which it is easy to project. This allows us to employ a block coordinate descent method to obtain a nearby regular index one stable system. We illustrate the effectiveness of the algorithm on several examples.

LGOct 1, 2023

Subtractive Mixture Models via Squaring: Representation and Learning

Lorenzo Loconte, Aleksanteri M. Sladek, Stefan Mengel et al.

Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number of components needed to model complex distributions. However, learning such subtractive mixtures while ensuring they still encode a non-negative function is challenging. We investigate how to learn and perform inference on deep subtractive mixtures by squaring them. We do this in the framework of probabilistic circuits, which enable us to represent tensorized mixtures and generalize several other subtractive models. We theoretically prove that the class of squared circuits allowing subtractions can be exponentially more expressive than traditional additive mixtures; and, we empirically show this increased expressiveness on a series of real-world distribution estimation tasks.

LGSep 26, 2022

Bounded Simplex-Structured Matrix Factorization: Algorithms, Identifiability and Applications

Olivier Vu Thanh, Nicolas Gillis, Fabian Lecron

In this paper, we propose a new low-rank matrix factorization model dubbed bounded simplex-structured matrix factorization (BSSMF). Given an input matrix $X$ and a factorization rank $r$, BSSMF looks for a matrix $W$ with $r$ columns and a matrix $H$ with $r$ rows such that $X \approx WH$ where the entries in each column of $W$ are bounded, that is, they belong to given intervals, and the columns of $H$ belong to the probability simplex, that is, $H$ is column stochastic. BSSMF generalizes nonnegative matrix factorization (NMF), and simplex-structured matrix factorization (SSMF). BSSMF is particularly well suited when the entries of the input matrix $X$ belong to a given interval; for example when the rows of $X$ represent images, or $X$ is a rating matrix such as in the Netflix and MovieLens datasets where the entries of $X$ belong to the interval $[1,5]$. The simplex-structured matrix $H$ not only leads to an easily understandable decomposition providing a soft clustering of the columns of $X$, but implies that the entries of each column of $WH$ belong to the same intervals as the columns of $W$. In this paper, we first propose a fast algorithm for BSSMF, even in the presence of missing data in $X$. Then we provide identifiability conditions for BSSMF, that is, we provide conditions under which BSSMF admits a unique decomposition, up to trivial ambiguities. Finally, we illustrate the effectiveness of BSSMF on two applications: extraction of features in a set of images, and the matrix completion problem for recommender systems.

SPSep 26, 2022

Least-squares methods for nonnegative matrix factorization over rational functions

Cécile Hautecoeur, Lieven De Lathauwer, Nicolas Gillis et al.

Nonnegative Matrix Factorization (NMF) models are widely used to recover linearly mixed nonnegative data. When the data is made of samplings of continuous signals, the factors in NMF can be constrained to be samples of nonnegative rational functions, which allow fairly general models; this is referred to as NMF using rational functions (R-NMF). We first show that, under mild assumptions, R-NMF has an essentially unique factorization unlike NMF, which is crucial in applications where ground-truth factors need to be recovered such as blind source separation problems. Then we present different approaches to solve R-NMF: the R-HANLS, R-ANLS and R-NLS methods. From our tests, no method significantly outperforms the others, and a trade-off should be done between time and accuracy. Indeed, R-HANLS is fast and accurate for large problems, while R-ANLS is more accurate, but also more resources demanding, both in time and memory. R-NLS is very accurate but only for small problems. Moreover, we show that R-NMF outperforms NMF in various tasks including the recovery of semi-synthetic continuous signals, and a classification problem of real hyperspectral signals.

NAJun 16, 2022

Partial Identifiability for Nonnegative Matrix Factorization

Nicolas Gillis, Róbert Rajkó

Given a nonnegative matrix factorization, $R$, and a factorization rank, $r$, Exact nonnegative matrix factorization (Exact NMF) decomposes $R$ as the product of two nonnegative matrices, $C$ and $S$ with $r$ columns, such as $R = CS^\top$. A central research topic in the literature is the conditions under which such a decomposition is unique/identifiable, up to trivial ambiguities. In this paper, we focus on partial identifiability, that is, the uniqueness of a subset of columns of $C$ and $S$. We start our investigations with the data-based uniqueness (DBU) theorem from the chemometrics literature. The DBU theorem analyzes all feasible solutions of Exact NMF, and relies on sparsity conditions on $C$ and $S$. We provide a mathematically rigorous theorem of a recently published restricted version of the DBU theorem, relying only on simple sparsity and algebraic conditions: it applies to a particular solution of Exact NMF (as opposed to all feasible solutions) and allows us to guarantee the partial uniqueness of a single column of $C$ or $S$. Second, based on a geometric interpretation of the restricted DBU theorem, we obtain a new partial identifiability result. This geometric interpretation also leads us to another partial identifiability result in the case $r=3$. Third, we show how partial identifiability results can be used sequentially to guarantee the identifiability of more columns of $C$ and $S$. We illustrate these results on several examples, including one from the chemometrics literature.

61.1LGMay 13Code

Supervised Deep Multimodal Matrix Factorization for Interpretable Brain Network Analysis

Amjad Seyedi, Lifang He, Songlin Zhao et al.

We present Supervised Deep Multimodal Matrix Factorization (SD3MF), an interpretable framework for integrative brain network analysis that generalizes Symmetric Nonnegative Matrix Tri-Factorization (SNMTF) from unsupervised single-graph clustering to supervised prediction over populations of multimodal graphs. SD3MF learns deep hierarchical factorizations for each modality together with a shared latent representation that aligns subjects across views. An encoder-decoder formulation jointly optimizes graph reconstruction and supervised prediction, while adaptive weights enable data-driven multimodal fusion. By representing each subject through community-level interaction matrices, the model yields interpretable and discriminative features. Experiments on multimodal connectome datasets show that SD3MF consistently outperforms strong deep learning baselines such as CNNs and GNNs, while enabling biologically interpretable insights. Code for reproducibility is available at: https://github.com/amjadseyedi/SD3MF.

LGJul 20, 2022

Revisiting data augmentation for subspace clustering

Maryam Abdolali, Nicolas Gillis

Subspace clustering is the classical problem of clustering a collection of data samples that approximately lie around several low-dimensional subspaces. The current state-of-the-art approaches for this problem are based on the self-expressive model which represents the samples as linear combination of other samples. However, these approaches require sufficiently well-spread samples for accurate representation which might not be necessarily accessible in many applications. In this paper, we shed light on this commonly neglected issue and argue that data distribution within each subspace plays a critical role in the success of self-expressive models. Our proposed solution to tackle this issue is motivated by the central role of data augmentation in the generalization power of deep neural networks. We propose two subspace clustering frameworks for both unsupervised and semi-supervised settings that use augmented samples as an enlarged dictionary to improve the quality of the self-expressive representation. We present an automatic augmentation strategy using a few labeled samples for the semi-supervised problem relying on the fact that the data samples lie in the union of multiple linear subspaces. Experimental results confirm the effectiveness of data augmentation, as it significantly improves the performance of general self-expressive models.

NANov 22, 2017

Multiplicative Updates for Polynomial Root Finding

Nicolas Gillis

Let $f(x)=p(x)-q(x)$ be a polynomial with real coefficients whose roots have nonnegative real part, where $p$ and $q$ are polynomials with nonnegative coefficients. In this paper, we prove the following: Given an initial point $x_0 > 0$, the multiplicative update $x_{t+1} = x_t \, p(x_t)/q(x_t)$ ($t=0,1,\dots$) monotonically and linearly converges to the largest (resp. smallest) real roots of $f$ smaller (resp. larger) than $x_0$ if $p(x_0) < q(x_0)$ (resp. $q(x_0) < p(x_0)$). The motivation to study this algorithm comes from the multiplicative updates proposed in the literature to solve optimization problems with nonnegativity constraints; in particular many variants of nonnegative matrix factorization.

LGSep 15, 2023

Deep Nonnegative Matrix Factorization with Beta Divergences

Valentin Leplat, Le Thi Khanh Hien, Akwum Onwunta et al.

Deep Nonnegative Matrix Factorization (deep NMF) has recently emerged as a valuable technique for extracting multiple layers of features across different scales. However, all existing deep NMF models and algorithms have primarily centered their evaluation on the least squares error, which may not be the most appropriate metric for assessing the quality of approximations on diverse datasets. For instance, when dealing with data types such as audio signals and documents, it is widely acknowledged that $β$-divergences offer a more suitable alternative. In this paper, we develop new models and algorithms for deep NMF using some $β$-divergences, with a focus on the Kullback-Leibler divergence. Subsequently, we apply these techniques to the extraction of facial features, the identification of topics within document collections, and the identification of materials within hyperspectral images.

25.6SIApr 28

Matrix Factorization Framework for Community Detection under the Degree-Corrected Block Model

Alexandra Dache, Arnaud Vandaele, Nicolas Gillis

Community detection is a fundamental task in data analysis, and block models provide an approach for identifying a wide variety of community structures while offering high interpretability. The degree-corrected block model (DCBM) is an established model that accounts for the heterogeneity of node degrees. However, inference methods are computationally costly and highly sensitive to initialization, while cheaper alternatives, such as spectral or modularity-based approaches, are restricted to detecting specific structures, typically assortative. In this work, we show that DCBM inference can be reformulated as a constrained nonnegative matrix factorization problem. Leveraging this insight, we propose a novel method for community detection and a theoretically well-grounded initialization strategy that provides an initial estimate of communities for inference algorithms. Our approach is agnostic to any specific network structure and applies to graphs with any structure representable by a DCBM. Experiments on synthetic and real benchmark networks show that our method detects communities comparable to those found by DCBM inference while being faster; for instance, it processes a graph with 100,000 nodes and 1,000,000 edges in approximately 4 minutes. Moreover, the proposed initialization strategy significantly improves solution quality and reduces the number of iterations required by all tested inference algorithms. Overall, this work provides a scalable and robust framework for community detection and highlights the benefits of a matrix-factorization perspective for the DCBM.

SPDec 19, 2025

Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions

Atharva Awari, Nicolas Gillis, Arnaud Vandaele

We present an algorithm based on the alternating direction method of multipliers (ADMM) for solving nonlinear matrix decompositions (NMD). Given an input matrix $X \in \mathbb{R}^{m \times n}$ and a factorization rank $r \ll \min(m, n)$, NMD seeks matrices $W \in \mathbb{R}^{m \times r}$ and $H \in \mathbb{R}^{r \times n}$ such that $X \approx f(WH)$, where $f$ is an element-wise nonlinear function. We evaluate our method on several representative nonlinear models: the rectified linear unit activation $f(x) = \max(0, x)$, suitable for nonnegative sparse data approximation, the component-wise square $f(x) = x^2$, applicable to probabilistic circuit representation, and the MinMax transform $f(x) = \min(b, \max(a, x))$, relevant for recommender systems. The proposed framework flexibly supports diverse loss functions, including least squares, $\ell_1$ norm, and the Kullback-Leibler divergence, and can be readily extended to other nonlinearities and metrics. We illustrate the applicability, efficiency, and adaptability of the approach on real-world datasets, highlighting its potential for a broad range of applications.

6.9OCMay 13

Computing Lower Bounds on the Nonnegative Rank via Non-Convex Optimization Solvers

Timothy Baeckelant, Arnaud Vandaele, Nicolas Gillis

The nonnegative rank of a nonnegative matrix $X$ is the smallest number of nonnegative rank-one factors that sum to $X$. Since computing the nonnegative rank is NP-hard, it is common to circumvent this issue by computing lower and upper bounds. In this paper, we propose non-convex formulations and practical implementations for four important lower bounds for the nonnegative rank, namely the fooling set bound (FSB), the rectangle covering bound (RCB), the hyperplane separation bound (HSB), and the self-scaled bound (SSB). In particular, our algorithm for computing the SSB is the first available in the literature, to the best of our knowledge. It allows us to improve the best known lower bound on the nonnegative rank for some matrices. In some cases, they coincide with the best known upper bound, thereby establishing their exact nonnegative rank for the first time. Moreover, on canonical benchmarks, we show that our non-convex approaches provide a meaningful and often competitive alternative to standard methods. The paper also provides a consolidated reference for the current state of several classical lower bounds on a large number of benchmark matrices.

LGFeb 4

Maximum-Volume Nonnegative Matrix Factorization

Olivier Vu Thanh, Nicolas Gillis

Nonnegative matrix factorization (NMF) is a popular data embedding technique. Given a nonnegative data matrix $X$, it aims at finding two lower dimensional matrices, $W$ and $H$, such that $X\approx WH$, where the factors $W$ and $H$ are constrained to be element-wise nonnegative. The factor $W$ serves as a basis for the columns of $X$. In order to obtain more interpretable and unique solutions, minimum-volume NMF (MinVol NMF) minimizes the volume of $W$. In this paper, we consider the dual approach, where the volume of $H$ is maximized instead; this is referred to as maximum-volume NMF (MaxVol NMF). MaxVol NMF is identifiable under the same conditions as MinVol NMF in the noiseless case, but it behaves rather differently in the presence of noise. In practice, MaxVol NMF is much more effective to extract a sparse decomposition and does not generate rank-deficient solutions. In fact, we prove that the solutions of MaxVol NMF with the largest volume correspond to clustering the columns of $X$ in disjoint clusters, while the solutions of MinVol NMF with smallest volume are rank deficient. We propose two algorithms to solve MaxVol NMF. We also present a normalized variant of MaxVol NMF that exhibits better performance than MinVol NMF and MaxVol NMF, and can be interpreted as a continuum between standard NMF and orthogonal NMF. We illustrate our results in the context of hyperspectral unmixing.

NANov 10, 2025

A Provably-Correct and Robust Convex Model for Smooth Separable NMF

Junjun Pan, Valentin Leplat, Michael Ng et al.

Nonnegative matrix factorization (NMF) is a linear dimensionality reduction technique for nonnegative data, with applications such as hyperspectral unmixing and topic modeling. NMF is a difficult problem in general (NP-hard), and its solutions are typically not unique. To address these two issues, additional constraints or assumptions are often used. In particular, separability assumes that the basis vectors in the NMF are equal to some columns of the input matrix. In that case, the problem is referred to as separable NMF (SNMF) and can be solved in polynomial-time with robustness guarantees, while identifying a unique solution. However, in real-world scenarios, due to noise or variability, multiple data points may lie near the basis vectors, which SNMF does not leverage. In this work, we rely on the smooth separability assumption, which assumes that each basis vector is close to multiple data points. We explore the properties of the corresponding problem, referred to as smooth SNMF (SSNMF), and examine how it relates to SNMF and orthogonal NMF. We then propose a convex model for SSNMF and show that it provably recovers the sought-after factors, even in the presence of noise. We finally adapt an existing fast gradient method to solve this convex model for SSNMF, and show that it compares favorably with state-of-the-art methods on both synthetic and hyperspectral datasets.

MLNov 6, 2025

Robustness of Minimum-Volume Nonnegative Matrix Factorization under an Expanded Sufficiently Scattered Condition

Giovanni Barbarino, Nicolas Gillis, Subhayan Saha

Minimum-volume nonnegative matrix factorization (min-vol NMF) has been used successfully in many applications, such as hyperspectral imaging, chemical kinetics, spectroscopy, topic modeling, and audio source separation. However, its robustness to noise has been a long-standing open problem. In this paper, we prove that min-vol NMF identifies the groundtruth factors in the presence of noise under a condition referred to as the expanded sufficiently scattered condition which requires the data points to be sufficiently well scattered in the latent simplex generated by the basis vectors.

NANov 25, 2024

On the Robustness of the Successive Projection Algorithm

Giovanni Barbarino, Nicolas Gillis

The successive projection algorithm (SPA) is a workhorse algorithm to learn the $r$ vertices of the convex hull of a set of $(r-1)$-dimensional data points, a.k.a. a latent simplex, which has numerous applications in data science. In this paper, we revisit the robustness to noise of SPA and several of its variants. In particular, when $r \geq 3$, we prove the tightness of the existing error bounds for SPA and for two more robust preconditioned variants of SPA. We also provide significantly improved error bounds for SPA, by a factor proportional to the conditioning of the $r$ vertices, in two special cases: for the first extracted vertex, and when $r \leq 2$. We then provide further improvements for the error bounds of a translated version of SPA proposed by Arora et al. (''A practical algorithm for topic modeling with provable guarantees'', ICML, 2013) in two special cases: for the first two extracted vertices, and when $r \leq 3$. Finally, we propose a new more robust variant of SPA that first shifts and lifts the data points in order to minimize the conditioning of the problem. We illustrate our results on synthetic data.

LGJan 12, 2024

Block Majorization Minimization with Extrapolation and Application to $β$-NMF

Le Thi Khanh Hien, Valentin Leplat, Nicolas Gillis

We propose a Block Majorization Minimization method with Extrapolation (BMMe) for solving a class of multi-convex optimization problems. The extrapolation parameters of BMMe are updated using a novel adaptive update rule. By showing that block majorization minimization can be reformulated as a block mirror descent method, with the Bregman divergence adaptively updated at each iteration, we establish subsequential convergence for BMMe. We use this method to design efficient algorithms to tackle nonnegative matrix factorization problems with the $β$-divergences ($β$-NMF) for $β\in [1,2]$. These algorithms, which are multiplicative updates with extrapolation, benefit from our novel results that offer convergence guarantees. We also empirically illustrate the significant acceleration of BMMe for $β$-NMF through extensive experiments.

LGApr 18, 2025

Efficient algorithms for the Hadamard decomposition

Samuel Wertz, Arnaud Vandaele, Nicolas Gillis

The Hadamard decomposition is a powerful technique for data analysis and matrix compression, which decomposes a given matrix into the element-wise product of two or more low-rank matrices. In this paper, we develop an efficient algorithm to solve this problem, leveraging an alternating optimization approach that decomposes the global non-convex problem into a series of convex sub-problems. To improve performance, we explore advanced initialization strategies inspired by the singular value decomposition (SVD) and incorporate acceleration techniques by introducing momentum-based updates. Beyond optimizing the two-matrix case, we also extend the Hadamard decomposition framework to support more than two low-rank matrices, enabling approximations with higher effective ranks while preserving computational efficiency. Finally, we conduct extensive experiments to compare our method with the existing gradient descent-based approaches for the Hadamard decomposition and with traditional low-rank approximation techniques. The results highlight the effectiveness of our proposed method across diverse datasets.

NAMar 29, 2024

Dual Simplex Volume Maximization for Simplex-Structured Matrix Factorization

Maryam Abdolali, Giovanni Barbarino, Nicolas Gillis

Simplex-structured matrix factorization (SSMF) is a generalization of nonnegative matrix factorization, a fundamental interpretable data analysis model, and has applications in hyperspectral unmixing and topic modeling. To obtain identifiable solutions, a standard approach is to find minimum-volume solutions. By taking advantage of the duality/polarity concept for polytopes, we convert minimum-volume SSMF in the primal space to a maximum-volume problem in the dual space. We first prove the identifiability of this maximum-volume dual problem. Then, we use this dual formulation to provide a novel optimization approach which bridges the gap between two existing families of algorithms for SSMF, namely volume minimization and facet identification. Numerical experiments show that the proposed approach performs favorably compared to the state-of-the-art SSMF algorithms.

LGFeb 8, 2024

Checking the Sufficiently Scattered Condition using a Global Non-Convex Optimization Software

Nicolas Gillis, Robert Luce

The sufficiently scattered condition (SSC) is a key condition in the study of identifiability of various matrix factorization problems, including nonnegative, minimum-volume, symmetric, simplex-structured, and polytopic matrix factorizations. The SSC allows one to guarantee that the computed matrix factorization is unique/identifiable, up to trivial ambiguities. However, this condition is NP-hard to check in general. In this paper, we show that it can however be checked in a reasonable amount of time in realistic scenarios, when the factorization rank is not too large. This is achieved by formulating the problem as a non-convex quadratic optimization problem over a bounded set. We use the global non-convex optimization software Gurobi, and showcase the usefulness of this code on synthetic data sets and on real-world hyperspectral images.

NAMay 19, 2025

Identifiability of Nonnegative Tucker Decompositions -- Part I: Theory

Subhayan Saha, Giovanni Barbarino, Nicolas Gillis

Tensor decompositions have become a central tool in data science, with applications in areas such as data analysis, signal processing, and machine learning. A key property of many tensor decompositions, such as the canonical polyadic decomposition, is identifiability: the factors are unique, up to trivial scaling and permutation ambiguities. This allows one to recover the groundtruth sources that generated the data. The Tucker decomposition (TD) is a central and widely used tensor decomposition model. However, it is in general not identifiable. In this paper, we study the identifiability of the nonnegative TD (nTD). By adapting and extending identifiability results of nonnegative matrix factorization (NMF), we provide uniqueness results for nTD. Our results require the nonnegative matrix factors to have some degree of sparsity (namely, satisfy the separability condition, or the sufficiently scattered condition), while the core tensor only needs to have some slices (or linear combinations of them) or unfoldings with full column rank (but does not need to be nonnegative). Under such conditions, we derive several procedures, using either unfoldings or slices of the input tensor, to obtain identifiable nTDs by minimizing the volume of unfoldings or slices of the core tensor.

LGMar 31, 2025

An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function

Nicolas Gillis, Margherita Porcelli, Giovanni Seraghiti

Nonlinear matrix decomposition (NMD) with the ReLU function, denoted ReLU-NMD, is the following problem: given a sparse, nonnegative matrix $X$ and a factorization rank $r$, identify a rank-$r$ matrix $Θ$ such that $X\approx \max(0,Θ)$. This decomposition finds application in data compression, matrix completion with entries missing not at random, and manifold learning. The standard ReLU-NMD model minimizes the least squares error, that is, $\|X - \max(0,Θ)\|_F^2$. The corresponding optimization problem is nondifferentiable and highly nonconvex. This motivated Saul to propose an alternative model, Latent-ReLU-NMD, where a latent variable $Z$ is introduced and satisfies $\max(0,Z)=X$ while minimizing $\|Z - Θ\|_F^2$ (``A nonlinear matrix decomposition for mining the zeros of sparse data'', SIAM J. Math. Data Sci., 2022). Our first contribution is to show that the two formulations may yield different low-rank solutions $Θ$; in particular, we show that Latent-ReLU-NMD can be ill-posed when ReLU-NMD is not, meaning that there are instances in which the infimum of Latent-ReLU-NMD is not attained while that of ReLU-NMD is. We also consider another alternative model, called 3B-ReLU-NMD, which parameterizes $Θ=WH$, where $W$ has $r$ columns and $H$ has $r$ rows, allowing one to get rid of the rank constraint in Latent-ReLU-NMD. Our second contribution is to prove the convergence of a block coordinate descent (BCD) applied to 3B-ReLU-NMD and referred to as BCD-NMD. Our third contribution is a novel extrapolated variant of BCD-NMD, dubbed eBCD-NMD, which we prove is also convergent under mild assumptions. We illustrate the significant acceleration effect of eBCD-NMD compared to BCD-NMD, and also show that eBCD-NMD performs well against the state of the art on synthetic and real-world data sets.

6.4LGMar 31

Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data

Giovanni Seraghiti, Kévin Dubrulle, Arnaud Vandaele et al.

Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, $X$, by the product of two nonnegative factors, $WH$, where $W$ has $r$ columns and $H$ has $r$ rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when $r=1$, in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of $WH$ associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).

20.3NAMar 13

Computing the Nonnegative Low-Rank Leading Eigenmatrix and its Applications to Markov Grids and Metzler Operators

Nicolas Gillis, Carmela Scalone

We consider in this paper the problem of computing a nonnegative low-rank approximation of the rightmost eigenpair of a linear matrix-valued real operator. We propose an algorithm based on the time integration of a suitable differential system, whose solution is parametrized according to a nonnegative factorization. The conservation of the nonnegativity is theoretically motivated by the Perron-Frobenius theorem, while the computation of the rightmost eigenpair is motivated by two applications: (1) a new class of Markov chains, which we called Markov grids, whose transition matrices can be decomposed as the sum of Kronecker products, and (2) spatially structured systems in growth-diffusion operators arising for example in population and epidemic dynamics. Theoretical analysis and computational experiments show the effectiveness of the algorithm compared to standard approaches.

AISep 23, 2025

From latent factors to language: a user study on LLM-generated explanations for an inherently interpretable matrix-based recommender system

Maxime Manderlier, Fabian Lecron, Olivier Vu Thanh et al.

We investigate whether large language models (LLMs) can generate effective, user-facing explanations from a mathematically interpretable recommendation model. The model is based on constrained matrix factorization, where user types are explicitly represented and predicted item scores share the same scale as observed ratings, making the model's internal representations and predicted scores directly interpretable. This structure is translated into natural language explanations using carefully designed LLM prompts. Many works in explainable AI rely on automatic evaluation metrics, which often fail to capture users' actual needs and perceptions. In contrast, we adopt a user-centered approach: we conduct a study with 326 participants who assessed the quality of the explanations across five key dimensions-transparency, effectiveness, persuasion, trust, and satisfaction-as well as the recommendations themselves. To evaluate how different explanation strategies are perceived, we generate multiple explanation types from the same underlying model, varying the input information provided to the LLM. Our analysis reveals that all explanation types are generally well received, with moderate statistical differences between strategies. User comments further underscore how participants react to each type of explanation, offering complementary insights beyond the quantitative results.

OCMay 17, 2023

Algorithms for Boolean Matrix Factorization using Integer Programming

Christos Kolomvakis, Arnaud Vandaele, Nicolas Gillis

Boolean matrix factorization (BMF) approximates a given binary input matrix as the product of two smaller binary factors. As opposed to binary matrix factorization which uses standard arithmetic, BMF uses the Boolean OR and Boolean AND operations to perform matrix products, which leads to lower reconstruction errors. BMF is an NP-hard problem. In this paper, we first propose an alternating optimization (AO) strategy that solves the subproblem in one factor matrix in BMF using an integer program (IP). We also provide two ways to initialize the factors within AO. Then, we show how several solutions of BMF can be combined optimally using another IP. This allows us to come up with a new algorithm: it generates several solutions using AO and then combines them in an optimal way. Experiments show that our algorithms (available on gitlab) outperform the state of the art on medium-scale problems.

LGMay 15, 2023

Accelerated Algorithms for Nonlinear Matrix Decomposition with the ReLU function

Giovanni Seraghiti, Atharva Awari, Arnaud Vandaele et al.

In this paper, we study the following nonlinear matrix decomposition (NMD) problem: given a sparse nonnegative matrix $X$, find a low-rank matrix $Θ$ such that $X \approx f(Θ)$, where $f$ is an element-wise nonlinear function. We focus on the case where $f(\cdot) = \max(0, \cdot)$, the rectified unit (ReLU) non-linear activation. We refer to the corresponding problem as ReLU-NMD. We first provide a brief overview of the existing approaches that were developed to tackle ReLU-NMD. Then we introduce two new algorithms: (1) aggressive accelerated NMD (A-NMD) which uses an adaptive Nesterov extrapolation to accelerate an existing algorithm, and (2) three-block NMD (3B-NMD) which parametrizes $Θ= WH$ and leads to a significant reduction in the computational cost. We also propose an effective initialization strategy based on the nuclear norm as a proxy for the rank function. We illustrate the effectiveness of the proposed algorithms (available on gitlab) on synthetic and real-world data sets.

SPOct 11, 2021

Smoothed Separable Nonnegative Matrix Factorization

Nicolas Nadisic, Nicolas Gillis, Christophe Kervazo

Given a set of data points belonging to the convex hull of a set of vertices, a key problem in linear algebra, signal processing, data analysis and machine learning is to estimate these vertices in the presence of noise. Many algorithms have been developed under the assumption that there is at least one nearby data point to each vertex; two of the most widely used ones are vertex component analysis (VCA) and the successive projection algorithm (SPA). This assumption is known as the pure-pixel assumption in blind hyperspectral unmixing, and as the separability assumption in nonnegative matrix factorization. More recently, Bhattacharyya and Kannan (ACM-SIAM Symposium on Discrete Algorithms, 2020) proposed an algorithm for learning a latent simplex (ALLS) that relies on the assumption that there is more than one nearby data point to each vertex. In that scenario, ALLS is probalistically more robust to noise than algorithms based on the separability assumption. In this paper, inspired by ALLS, we propose smoothed VCA (SVCA) and smoothed SPA (SSPA) that generalize VCA and SPA by assuming the presence of several nearby data points to each vertex. We illustrate the effectiveness of SVCA and SSPA over VCA, SPA and ALLS on synthetic data sets, on the unmixing of hyperspectral images, and on feature extraction on facial images data sets. In addition, our study highlights new theoretical results for VCA.

OCJul 9, 2021

Block Alternating Bregman Majorization Minimization with Extrapolation

Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis et al.

In this paper, we consider a class of nonsmooth nonconvex optimization problems whose objective is the sum of a block relative smooth function and a proper and lower semicontinuous block separable function. Although the analysis of block proximal gradient (BPG) methods for the class of block $L$-smooth functions have been successfully extended to Bregman BPG methods that deal with the class of block relative smooth functions, accelerated Bregman BPG methods are scarce and challenging to design. Taking our inspiration from Nesterov-type acceleration and the majorization-minimization scheme, we propose a block alternating Bregman Majorization-Minimization framework with Extrapolation (BMME). We prove subsequential convergence of BMME to a first-order stationary point under mild assumptions, and study its global convergence under stronger conditions. We illustrate the effectiveness of BMME on the penalized orthogonal nonnegative matrix factorization problem.

LGMar 19, 2021

Beyond Linear Subspace Clustering: A Comparative Study of Nonlinear Manifold Clustering Algorithms

Maryam Abdolali, Nicolas Gillis

Subspace clustering is an important unsupervised clustering approach. It is based on the assumption that the high-dimensional data points are approximately distributed around several low-dimensional linear subspaces. The majority of the prominent subspace clustering algorithms rely on the representation of the data points as linear combinations of other data points, which is known as a self-expressive representation. To overcome the restrictive linearity assumption, numerous nonlinear approaches were proposed to extend successful subspace clustering approaches to data on a union of nonlinear manifolds. In this comparative study, we provide a comprehensive overview of nonlinear subspace clustering approaches proposed in the last decade. We introduce a new taxonomy to classify the state-of-the-art approaches into three categories, namely locality preserving, kernel based, and neural network based. The major representative algorithms within each category are extensively compared on carefully designed synthetic and real-world data sets. The detailed analysis of these approaches unfolds potential research directions and unsolved challenges in this field.

OCFeb 10, 2021

A Framework of Inertial Alternating Direction Method of Multipliers for Non-Convex Non-Smooth Optimization

Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis

In this paper, we propose an algorithmic framework, dubbed inertial alternating direction methods of multipliers (iADMM), for solving a class of nonconvex nonsmooth multiblock composite optimization problems with linear constraints. Our framework employs the general minimization-majorization (MM) principle to update each block of variables so as to not only unify the convergence analysis of previous ADMM that use specific surrogate functions in the MM step, but also lead to new efficient ADMM schemes. To the best of our knowledge, in the nonconvex nonsmooth setting, ADMM used in combination with the MM principle to update each block of variables, and ADMM combined with \emph{inertial terms for the primal variables} have not been studied in the literature. Under standard assumptions, we prove the subsequential convergence and global convergence for the generated sequence of iterates. We illustrate the effectiveness of iADMM on a class of nonconvex low-rank representation problems.

SPNov 24, 2020

Provably robust blind source separation of linear-quadratic near-separable mixtures

Christophe Kervazo, Nicolas Gillis, Nicolas Dobigeon

In this work, we consider the problem of blind source separation (BSS) by departing from the usual linear model and focusing on the linear-quadratic (LQ) model. We propose two provably robust and computationally tractable algorithms to tackle this problem under separability assumptions which require the sources to appear as samples in the data set. The first algorithm generalizes the successive nonnegative projection algorithm (SNPA), designed for linear BSS, and is referred to as SNPALQ. By explicitly modeling the product terms inherent to the LQ model along the iterations of the SNPA scheme, the nonlinear contributions of the mixing are mitigated, thus improving the separation quality. SNPALQ is shown to be able to recover the ground truth factors that generated the data, even in the presence of noise. The second algorithm is a brute-force (BF) algorithm, which is used as a post-processing step for SNPALQ. It enables to discard the spurious (mixed) samples extracted by SNPALQ, thus broadening its applicability. The BF is in turn shown to be robust to noise under easier-to-check and milder conditions than SNPALQ. We show that SNPALQ with and without the BF postprocessing is relevant in realistic numerical experiments.

LGNov 22, 2020

Matrix-wise $\ell_0$-constrained Sparse Nonnegative Least Squares

Nicolas Nadisic, Jeremy E Cohen, Arnaud Vandaele et al.

Nonnegative least squares problems with multiple right-hand sides (MNNLS) arise in models that rely on additive linear combinations. In particular, they are at the core of most nonnegative matrix factorization algorithms and have many applications. The nonnegativity constraint is known to naturally favor sparsity, that is, solutions with few non-zero entries. However, it is often useful to further enhance this sparsity, as it improves the interpretability of the results and helps reducing noise, which leads to the sparse MNNLS problem. In this paper, as opposed to most previous works that enforce sparsity column- or row-wise, we first introduce a novel formulation for sparse MNNLS, with a matrix-wise sparsity constraint. Then, we present a two-step algorithm to tackle this problem. The first step divides sparse MNNLS in subproblems, one per column of the original problem. It then uses different algorithms to produce, either exactly or approximately, a Pareto front for each subproblem, that is, to produce a set of solutions representing different tradeoffs between reconstruction error and sparsity. The second step selects solutions among these Pareto fronts in order to build a sparsity-constrained matrix that minimizes the reconstruction error. We perform experiments on facial and hyperspectral images, and we show that our proposed two-step approach provides more accurate results than state-of-the-art sparse coding heuristics applied both column-wise and globally.

LGOct 30, 2020

Multiplicative Updates for NMF with $β$-Divergences under Disjoint Equality Constraints

Valentin Leplat, Nicolas Gillis, Jérôme Idier

Nonnegative matrix factorization (NMF) is the problem of approximating an input nonnegative matrix, $V$, as the product of two smaller nonnegative matrices, $W$ and $H$. In this paper, we introduce a general framework to design multiplicative updates (MU) for NMF based on $β$-divergences ($β$-NMF) with disjoint equality constraints, and with penalty terms in the objective function. By disjoint, we mean that each variable appears in at most one equality constraint. Our MU satisfy the set of constraints after each update of the variables during the optimization process, while guaranteeing that the objective function decreases monotonically. We showcase this framework on three NMF models, and show that it competes favorably the state of the art: (1)~$β$-NMF with sum-to-one constraints on the columns of $H$, (2) minimum-volume $β$-NMF with sum-to-one constraints on the columns of $W$, and (3) sparse $β$-NMF with $\ell_2$-norm constraints on the columns of $W$.

OCOct 23, 2020

An Inertial Block Majorization Minimization Framework for Nonsmooth Nonconvex Optimization

Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis

In this paper, we introduce TITAN, a novel inerTIal block majorizaTion minimizAtioN framework for non-smooth non-convex optimization problems. To the best of our knowledge, TITAN is the first framework of block-coordinate update method that relies on the majorization-minimization framework while embedding inertial force to each step of the block updates. The inertial force is obtained via an extrapolation operator that subsumes heavy-ball and Nesterov-type accelerations for block proximal gradient methods as special cases. By choosing various surrogate functions, such as proximal, Lipschitz gradient, Bregman, quadratic, and composite surrogate functions, and by varying the extrapolation operator, TITAN produces a rich set of inertial block-coordinate update methods. We study sub-sequential convergence as well as global convergence for the generated sequence of TITAN. We illustrate the effectiveness of TITAN on two important machine learning problems, namely sparse non-negative matrix factorization and matrix completion.

OCOct 5, 2020

Algorithms for Nonnegative Matrix Factorization with the Kullback-Leibler Divergence

Le Thi Khanh Hien, Nicolas Gillis

Nonnegative matrix factorization (NMF) is a standard linear dimensionality reduction technique for nonnegative data sets. In order to measure the discrepancy between the input data and the low-rank approximation, the Kullback-Leibler (KL) divergence is one of the most widely used objective function for NMF. It corresponds to the maximum likehood estimator when the underlying statistics of the observed data sample follows a Poisson distribution, and KL NMF is particularly meaningful for count data sets, such as documents or images. In this paper, we first collect important properties of the KL objective function that are essential to study the convergence of KL NMF algorithms. Second, together with reviewing existing algorithms for solving KL NMF, we propose three new algorithms that guarantee the non-increasingness of the objective function. We also provide a global convergence guarantee for one of our proposed algorithms. Finally, we conduct extensive numerical experiments to provide a comprehensive picture of the performances of the KL NMF algorithms.

LGOct 1, 2020

Deep matrix factorizations

Pierre De Handschutter, Nicolas Gillis, Xavier Siebert

Constrained low-rank matrix approximations have been known for decades as powerful linear dimensionality reduction techniques to be able to extract the information contained in large data sets in a relevant way. However, such low-rank approaches are unable to mine complex, interleaved features that underlie hierarchical semantics. Recently, deep matrix factorization (deep MF) was introduced to deal with the extraction of several layers of features and has been shown to reach outstanding performances on unsupervised tasks. Deep MF was motivated by the success of deep learning, as it is conceptually close to some neural networks paradigms. In this paper, we present the main models, algorithms, and applications of deep MF through a comprehensive literature review. We also discuss theoretical questions and perspectives of research.

LGJul 22, 2020

Simplex-Structured Matrix Factorization: Sparsity-based Identifiability and Provably Correct Algorithms

Maryam Abdolali, Nicolas Gillis

In this paper, we provide novel algorithms with identifiability guarantees for simplex-structured matrix factorization (SSMF), a generalization of nonnegative matrix factorization. Current state-of-the-art algorithms that provide identifiability results for SSMF rely on the sufficiently scattered condition (SSC) which requires the data points to be well spread within the convex hull of the basis vectors. The conditions under which our proposed algorithms recover the unique decomposition is in most cases much weaker than the SSC. We only require to have $d$ points on each facet of the convex hull of the basis vectors whose dimension is $d-1$. The key idea is based on extracting facets containing the largest number of points. We illustrate the effectiveness of our approach on synthetic data sets and hyperspectral images, showing that it outperforms state-of-the-art SSMF algorithms as it is able to handle higher noise levels, rank deficient matrices, outliers, and input data that highly violates the SSC.

SPJul 8, 2020

Multi-Resolution Beta-Divergence NMF for Blind Spectral Unmixing

Valentin Leplat, Nicolas Gillis, Cédric Févotte

Many datasets are obtained as a resolution trade-off between two adversarial dimensions; for example between the frequency and the temporal resolutions for the spectrogram of an audio signal, and between the number of wavelengths and the spatial resolution for a hyper/multi-spectral image. To perform blind source separation using observations with different resolutions, a standard approach is to use coupled nonnegative matrix factorizations (NMF). Most previous works have focused on the least squares error measure, which is the $β$-divergence for $β= 2$. In this paper, we formulate this multi-resolution NMF problem for any $β$-divergence, and propose an algorithm based on multiplicative updates (MU). We show on numerical experiments that the MU are able to obtain high resolutions in both dimensions on two applications: (1) blind unmixing of audio spectrograms: to the best of our knowledge, this is the first time a coupled NMF model is used in this context, and (2) the fusion of hyperspectral and multispectral images: we show that the MU compete favorable with state-of-the-art algorithms in particular in the presence of non-Gaussian noise.

SPJun 15, 2020

Computing Large-Scale Matrix and Tensor Decomposition with Structured Factors: A Unified Nonconvex Optimization Perspective

Xiao Fu, Nico Vervliet, Lieven De Lathauwer et al.

The proposed article aims at offering a comprehensive tutorial for the computational aspects of structured matrix and tensor factorization. Unlike existing tutorials that mainly focus on {\it algorithmic procedures} for a small set of problems, e.g., nonnegativity or sparsity-constrained factorization, we take a {\it top-down} approach: we start with general optimization theory (e.g., inexact and accelerated block coordinate descent, stochastic optimization, and Gauss-Newton methods) that covers a wide range of factorization problems with diverse constraints and regularization terms of engineering interest. Then, we go `under the hood' to showcase specific algorithm design under these introduced principles. We pay a particular attention to recent algorithmic developments in structured tensor and matrix factorization (e.g., random sketching and adaptive step size based stochastic optimization and structure-exploiting second-order algorithms), which are the state of the art---yet much less touched upon in the literature compared to {\it block coordinate descent} (BCD)-based methods. We expect that the article to have an educational values in the field of structured factorization and hope to stimulate more research in this important and exciting direction.

LGJun 13, 2020

Sparse Separable Nonnegative Matrix Factorization

Nicolas Nadisic, Arnaud Vandaele, Jeremy E. Cohen et al.

We propose a new variant of nonnegative matrix factorization (NMF), combining separability and sparsity assumptions. Separability requires that the columns of the first NMF factor are equal to columns of the input matrix, while sparsity requires that the columns of the second NMF factor are sparse. We call this variant sparse separable NMF (SSNMF), which we prove to be NP-complete, as opposed to separable NMF which can be solved in polynomial time. The main motivation to consider this new model is to handle underdetermined blind source separation problems, such as multispectral image unmixing. We introduce an algorithm to solve SSNMF, based on the successive nonnegative projection algorithm (SNPA, an effective algorithm for separable NMF), and an exact sparse nonnegative least squares solver. We prove that, in noiseless settings and under mild assumptions, our algorithm recovers the true underlying sources. This is illustrated by experiments on synthetic data sets and the unmixing of a multispectral image.