Panagiotis Patrinos

OC
h-index36
34papers
405citations
Novelty52%
AI Score56

34 Papers

IVAug 22, 2024Code
Quantization-aware Matrix Factorization for Low Bit Rate Image Compression

Pooya Ashtari, Pourya Behmandpoor, Fateme Nateghi Haredasht et al.

Lossy image compression is essential for efficient transmission and storage. Traditional compression methods mainly rely on discrete cosine transform (DCT) or singular value decomposition (SVD), both of which represent image data in continuous domains and, therefore, necessitate carefully designed quantizers. Notably, these methods consider quantization as a separate step, where quantization errors cannot be incorporated into the compression process. The sensitivity of these methods, especially SVD-based ones, to quantization errors significantly degrades reconstruction quality. To address this issue, we introduce a quantization-aware matrix factorization (QMF) to develop a novel lossy image compression method. QMF provides a low-rank representation of the image data as a product of two smaller factor matrices, with elements constrained to bounded integer values, thereby effectively integrating quantization with low-rank approximation. We propose an efficient, provably convergent iterative algorithm for QMF using a block coordinate descent (BCD) scheme, with subproblems having closed-form solutions. Our experiments on the Kodak and CLIC 2024 datasets demonstrate that our QMF compression method consistently outperforms JPEG at low bit rates below 0.25 bits per pixel (bpp) and remains comparable at higher bit rates. We also assessed our method's capability to preserve visual semantics by evaluating an ImageNet pre-trained classifier on compressed images. Remarkably, our method improved top-1 accuracy by over 5 percentage points compared to JPEG at bit rates under 0.25 bpp. The project is available at https://github.com/pashtari/lrf .

LGFeb 22, 2023
Deep Kernel Principal Component Analysis for Multi-level Feature Learning

Francesco Tonin, Qinghua Tao, Panagiotis Patrinos et al.

Principal Component Analysis (PCA) and its nonlinear extension Kernel PCA (KPCA) are widely used across science and industry for data analysis and dimensionality reduction. Modern deep learning tools have achieved great empirical success, but a framework for deep principal component analysis is still lacking. Here we develop a deep kernel PCA methodology (DKPCA) to extract multiple levels of the most informative components of the data. Our scheme can effectively identify new hierarchical variables, called deep principal components, capturing the main characteristics of high-dimensional data through a simple and interpretable numerical optimization. We couple the principal components of multiple KPCA levels, theoretically showing that DKPCA creates both forward and backward dependency across levels, which has not been explored in kernel methods and yet is crucial to extract more informative features. Various experimental evaluations on multiple data types show that DKPCA finds more efficient and disentangled representations with higher explained variance in fewer principal components, compared to the shallow KPCA. We demonstrate that our method allows for effective hierarchical data exploration, with the ability to separate the key generative factors of the input data both for large datasets and when few training samples are available. Overall, DKPCA can facilitate the extraction of useful patterns from high-dimensional data by learning more informative features organized in different levels, giving diversified aspects to explore the variation factors in the data, while maintaining a simple mathematical formulation.

OCFeb 20, 2023
Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems

Thomas Pethick, Puya Latafat, Panagiotis Patrinos et al.

This paper introduces a new extragradient-type algorithm for a class of nonconvex-nonconcave minimax problems. It is well-known that finding a local solution for general minimax problems is computationally intractable. This observation has recently motivated the study of structures sufficient for convergence of first order methods in the more general setting of variational inequalities when the so-called weak Minty variational inequality (MVI) holds. This problem class captures non-trivial structures as we demonstrate with examples, for which a large family of existing algorithms provably converge to limit cycles. Our results require a less restrictive parameter range in the weak MVI compared to what is previously known, thus extending the applicability of our scheme. The proposed algorithm is applicable to constrained and regularized problems, and involves an adaptive stepsize allowing for potentially larger stepsizes. Our scheme also converges globally even in settings where the underlying operator exhibits limit cycles.

SYFeb 27, 2015
A Convex Feasibility Approach to Anytime Model Predictive Control

Alberto Bemporad, Daniele Bernardini, Panagiotis Patrinos

This paper proposes to decouple performance optimization and enforcement of asymptotic convergence in Model Predictive Control (MPC) so that convergence to a given terminal set is achieved independently of how much performance is optimized at each sampling step. By embedding an explicit decreasing condition in the MPC constraints and thanks to a novel and very easy-to-implement convex feasibility solver proposed in the paper, it is possible to run an outer performance optimization algorithm on top of the feasibility solver and optimize for an amount of time that depends on the available CPU resources within the current sampling step (possibly going open-loop at a given sampling step in the extreme case no resources are available) and still guarantee convergence to the terminal set. While the MPC setup and the solver proposed in the paper can deal with quite general classes of functions, we highlight the synthesis method and show numerical results in case of linear MPC and ellipsoidal and polyhedral terminal sets.

LGJul 23, 2022
Tensor-based Multi-view Spectral Clustering via Shared Latent Space

Qinghua Tao, Francesco Tonin, Panagiotis Patrinos et al.

Multi-view Spectral Clustering (MvSC) attracts increasing attention due to diverse data sources. However, most existing works are prohibited in out-of-sample predictions and overlook model interpretability and exploration of clustering results. In this paper, a new method for MvSC is proposed via a shared latent space from the Restricted Kernel Machine framework. Through the lens of conjugate feature duality, we cast the weighted kernel principal component analysis problem for MvSC and develop a modified weighted conjugate feature duality to formulate dual variables. In our method, the dual variables, playing the role of hidden features, are shared by all views to construct a common latent space, coupling the views by learning projections from view-specific spaces. Such single latent space promotes well-separated clusters and provides straightforward data exploration, facilitating visualization and interpretation. Our method requires only a single eigendecomposition, whose dimension is independent of the number of views. To boost higher-order correlations, tensor-based modelling is introduced without increasing computational complexity. Our method can be flexibly applied with out-of-sample extensions, enabling greatly improved efficiency for large-scale data with fixed-size kernel schemes. Numerical experiments verify that our method is effective regarding accuracy, efficiency, and interpretability, showing a sharp eigenvalue decay and distinct latent variable distributions.

OCFeb 17, 2023
Solving stochastic weak Minty variational inequalities without increasing batch size

Thomas Pethick, Olivier Fercoq, Puya Latafat et al.

This paper introduces a family of stochastic extragradient-type algorithms for a class of nonconvex-nonconcave problems characterized by the weak Minty variational inequality (MVI). Unlike existing results on extragradient methods in the monotone setting, employing diminishing stepsizes is no longer possible in the weak MVI setting. This has led to approaches such as increasing batch sizes per iteration which can however be prohibitively expensive. In contrast, our proposed methods involves two stepsizes and only requires one additional oracle evaluation per iteration. We show that it is possible to keep one fixed stepsize while it is only the second stepsize that is taken to be diminishing, making it interesting even in the monotone setting. Almost sure convergence is established and we provide a unified analysis for this family of schemes which contains a nonlinear generalization of the celebrated primal dual hybrid gradient algorithm.

LGJun 12, 2023
Nonlinear SVD with Asymmetric Kernels: feature learning and asymmetric Nyström method

Qinghua Tao, Francesco Tonin, Panagiotis Patrinos et al.

Asymmetric data naturally exist in real life, such as directed graphs. Different from the common kernel methods requiring Mercer kernels, this paper tackles the asymmetric kernel-based learning problem. We describe a nonlinear extension of the matrix Singular Value Decomposition through asymmetric kernels, namely KSVD. First, we construct two nonlinear feature mappings w.r.t. rows and columns of the given data matrix. The proposed optimization problem maximizes the variance of each mapping projected onto the subspace spanned by the other, subject to a mutual orthogonality constraint. Through Lagrangian duality, we show that it can be solved by the left and right singular vectors in the feature space induced by the asymmetric kernel. Moreover, we start from the integral equations with a pair of adjoint eigenfunctions corresponding to the singular vectors on an asymmetrical kernel, and extend the Nyström method to asymmetric cases through the finite sample approximation, which can be applied to speedup the training in KSVD. Experiments show that asymmetric KSVD learns features outperforming Mercer-kernel based methods that resort to symmetrization, and also verify the effectiveness of the asymmetric Nyström method.

LGJan 31, 2023
Unsupervised Neighborhood Propagation Kernel Layers for Semi-supervised Node Classification

Sonny Achten, Francesco Tonin, Panagiotis Patrinos et al.

We present a deep Graph Convolutional Kernel Machine (GCKM) for semi-supervised node classification in graphs. The method is built of two main types of blocks: (i) We introduce unsupervised kernel machine layers propagating the node features in a one-hop neighborhood, using implicit node feature mappings. (ii) We specify a semi-supervised classification kernel machine through the lens of the Fenchel-Young inequality. We derive an effective initialization scheme and efficient end-to-end training algorithm in the dual variables for the full architecture. The main idea underlying GCKM is that, because of the unsupervised core, the final model can achieve higher performance in semi-supervised node classification when few labels are available for training. Experimental results demonstrate the effectiveness of the proposed framework.

OCJan 11, 2023
Adaptive proximal algorithms for convex optimization under local Lipschitz continuity of the gradient

Puya Latafat, Andreas Themelis, Lorenzo Stella et al.

Backtracking linesearch is the de facto approach for minimizing continuously differentiable functions with locally Lipschitz gradient. In recent years, it has been shown that in the convex setting it is possible to avoid linesearch altogether, and to allow the stepsize to adapt based on a local smoothness estimate without any backtracks or evaluations of the function value. In this work we propose an adaptive proximal gradient method, adaPG, that uses novel estimates of the local smoothness modulus which leads to less conservative stepsize updates and that can additionally cope with nonsmooth terms. This idea is extended to the primal-dual setting where an adaptive three-term primal-dual algorithm, adaPD, is proposed which can be viewed as an extension of the PDHG method. Moreover, in this setting the "essentially" fully adaptive variant adaPD$^+$ is proposed that avoids evaluating the linear operator norm by invoking a backtracking procedure, that, remarkably, does not require extra gradient evaluations. Numerical simulations demonstrate the effectiveness of the proposed algorithms compared to the state of the art.

LGJun 9, 2023
Extending Kernel PCA through Dualization: Sparsity, Robustness and Fast Algorithms

Francesco Tonin, Alex Lambert, Panagiotis Patrinos et al.

The goal of this paper is to revisit Kernel Principal Component Analysis (KPCA) through dualization of a difference of convex functions. This allows to naturally extend KPCA to multiple objective functions and leads to efficient gradient-based algorithms avoiding the expensive SVD of the Gram matrix. Particularly, we consider objective functions that can be written as Moreau envelopes, demonstrating how to promote robustness and sparsity within the same framework. The proposed method is evaluated on synthetic and real-world benchmarks, showing significant speedup in KPCA training time as well as highlighting the benefits in terms of robustness and sparsity.

SYOct 30, 2019
Safe Learning-Based Control of Stochastic Jump Linear Systems: a Distributionally Robust Approach

Mathijs Schuurmans, Pantelis Sopasakis, Panagiotis Patrinos

We consider the problem of designing control laws for stochastic jump linear systems where the disturbances are drawn randomly from a finite sample space according to an unknown distribution, which is estimated from a finite sample of i.i.d. observations. We adopt a distributionally robust approach to compute a mean-square stabilizing feedback gain with a given probability. The larger the sample size, the less conservative the controller, yet our methodology gives stability guarantees with high probability, for any number of samples. Using tools from statistical learning theory, we estimate confidence regions for the unknown probability distributions (ambiguity sets) which have the shape of total variation balls centered around the empirical distribution. We use these confidence regions in the design of appropriate distributionally robust controllers and show that the associated stability conditions can be cast as a tractable linear matrix inequality (LMI) by using conjugate duality. The resulting design procedure scales gracefully with the size of the probability space and the system dimensions. Through a numerical example, we illustrate the superior sample complexity of the proposed methodology over the stochastic approach.

OCNov 30, 2023
On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms

Puya Latafat, Andreas Themelis, Panagiotis Patrinos

Building upon recent works on linesearch-free adaptive proximal gradient methods, this paper proposes adaPG$^{q,r}$, a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds. Different choices of the parameters $q$ and $r$ are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations. In an attempt to better understand the underlying theory, its convergence is established in a more general setting that allows for time-varying parameters. Finally, an adaptive alternating minimization algorithm is presented by exploring the dual setting. This algorithm not only incorporates additional adaptivity, but also expands its applicability beyond standard strongly convex settings.

SYOct 29, 2019
Nonlinear Model Predictive Control for Distributed Motion Planning in Road Intersections Using PANOC

Alexander Katriniok, Pantelis Sopasakis, Mathijs Schuurmans et al.

The coordination of highly automated vehicles (or agents) in road intersections is an inherently nonconvex and challenging problem. In this paper, we propose a distributed motion planning scheme under reasonable vehicle-to-vehicle communication requirements. Each agent solves a nonlinear model predictive control problem in real time and transmits its planned trajectory to other agents, which may have conflicting objectives. The problem formulation is augmented with conditional constraints that enable the agents to decide whether to wait at a stopping line, if safe crossing is not possible. The involved nonconvex problems are solved very efficiently using the proximal averaged Newton method for optimal control (PANOC). We demonstrate the efficiency of the proposed approach in a realistic intersection crossing scenario.

LGJun 12, 2023
Combining Primal and Dual Representations in Deep Restricted Kernel Machines Classifiers

Francesco Tonin, Panagiotis Patrinos, Johan A. K. Suykens

In the context of deep learning with kernel machines, the deep Restricted Kernel Machine (DRKM) framework allows multiple levels of kernel PCA (KPCA) and Least-Squares Support Vector Machines (LSSVM) to be combined into a deep architecture using visible and hidden units. We propose a new method for DRKM classification coupling the objectives of KPCA and classification levels, with the hidden feature matrix lying on the Stiefel manifold. The classification level can be formulated as an LSSVM or as an MLP feature map, combining depth in terms of levels and layers. The classification level is expressed in its primal formulation, as the deep KPCA levels, in their dual formulation, can embed the most informative components of the data in a much lower dimensional space. The dual setting is independent of the dimension of the inputs and the primal setting is parametric, which makes the proposed method computationally efficient for both high-dimensional inputs and large datasets. In the experiments, we show that our developed algorithm can effectively learn from small datasets, while using less memory than the convolutional neural network (CNN) with high-dimensional data. and that models with multiple KPCA levels can outperform models with a single level. On the tested larger-scale datasets, DRKM is more energy efficient than CNN while maintaining comparable performance.

74.3OCApr 7
Parametric Nonconvex Optimization via Convex Surrogates

Renzi Wang, Panagiotis Patrinos, Alberto Bemporad

This paper presents a novel learning-based approach to construct a surrogate problem that approximates a given parametric nonconvex optimization problem. The surrogate function is designed to be the minimum of a finite set of functions, given by the composition of convex and monotonic terms, so that the surrogate problem can be solved directly through parallel convex optimization. As a proof of concept, numerical experiments on a nonconvex path tracking problem confirm the approximation quality of the proposed method.

OCAug 28, 2024
Stability of Primal-Dual Gradient Flow Dynamics for Multi-Block Convex Optimization Problems

Ibrahim K. Ozaslan, Panagiotis Patrinos, Mihailo R. Jovanović

We examine stability properties of primal-dual gradient flow dynamics for composite convex optimization problems with multiple, possibly nonsmooth, terms in the objective function under the generalized consensus constraint. The proposed dynamics are based on the proximal augmented Lagrangian and they provide a viable alternative to ADMM which faces significant challenges from both analysis and implementation viewpoints in large-scale multi-block scenarios. In contrast to customized algorithms with individualized convergence guarantees, we develop a systematic approach for solving a broad class of challenging composite optimization problems. We leverage various structural properties to establish global (exponential) convergence guarantees for the proposed dynamics. Our assumptions are much weaker than those required to prove (exponential) stability of primal-dual dynamics as well as (linear) convergence of discrete-time methods such as standard two-block and multi-block ADMM and EXTRA algorithms. Finally, we show necessity of some of our structural assumptions for exponential stability and provide computational experiments to demonstrate the convenience of the proposed approach for parallel and distributed computing applications.

OCJul 17, 2022
SPIRAL: A superlinearly convergent incremental proximal algorithm for nonconvex finite sum minimization

Pourya Behmandpoor, Puya Latafat, Andreas Themelis et al.

We introduce SPIRAL, a SuPerlinearly convergent Incremental pRoximal ALgorithm, for solving nonconvex regularized finite sum problems under a relative smoothness assumption. Each iteration of SPIRAL consists of an inner and an outer loop. It combines incremental gradient updates with a linesearch that has the remarkable property of never being triggered asymptotically, leading to superlinear convergence under mild assumptions at the limit point. Simulation results with L-BFGS directions on different convex, nonconvex, and non-Lipschitz differentiable problems show that our algorithm, as well as its adaptive variant, are competitive to the state of the art.

70.3OCMar 12
Exploiting Parallelism in a QPALM-based Solver for Optimal Control

Pieter Pas, Kristoffer Fink Løwenstein, Daniele Bernardini et al.

We discuss the opportunities for parallelization in the recently proposed QPALM-OCP algorithm, a solver tailored to quadratic programs arising in optimal control. A significant part of the computational work can be carried out independently for the different stages in the optimal control problem. We exploit this specific structure to apply parallelization and vectorization techniques in an optimized C++ implementation of the method. Results for optimal control benchmark problems and comparisons to the original QPALM method are provided.

89.6OCMar 13Code
Cyqlone: A Parallel, High-Performance Linear Solver for Optimal Control

Pieter Pas, Panagiotis Patrinos

We present Cyqlone, a solver for linear systems with a stage-wise optimal control structure that fully exploits the various levels of parallelism available in modern hardware. Cyqlone unifies algorithms based on the sequential Riccati recursion, parallel Schur complement methods, and cyclic reduction methods, thereby minimizing the required number of floating-point operations, while allowing parallelization across a configurable number of processors. Given sufficient parallelism, the solver run time scales with the logarithm of the horizon length (in contrast to the linear scaling of sequential Riccati-based methods), enabling real-time solution of long-horizon problems. Beyond multithreading on multi-core processors, implementations of Cyqlone can also leverage vectorization using batched linear algebra routines. Such batched routines exploit data parallelism using single instruction, multiple data (SIMD) operations, and expose a higher degree of instruction-level parallelism than their non-batched counterparts. This enables them to significantly outperform BLAS and BLASFEO for the small matrices that arise in optimal control. Building on this high-performance linear solver, we develop CyQPALM, a parallel and optimal-control-specific variant of the QPALM quadratic programming solver. It combines the parallel and vectorized linear algebra operations from Cyqlone with a parallel line search and parallel factorization updates, resulting in order-of-magnitude speedups over the state-of-the-art HPIPM solver. Open-source C++ implementations of Cyqlone and CyQPALM are available at https://github.com/kul-optec/cyqlone

SPNov 8, 2023
Asynchronous Message-Passing and Zeroth-Order Optimization Based Distributed Learning with a Use-Case in Resource Allocation in Communication Networks

Pourya Behmandpoor, Marc Moonen, Panagiotis Patrinos

Distributed learning and adaptation have received significant interest and found wide-ranging applications in machine learning and signal processing. While various approaches, such as shared-memory optimization, multi-task learning, and consensus-based learning (e.g., federated learning and learning over graphs), focus on optimizing either local costs or a global cost, there remains a need for further exploration of their interconnections. This paper specifically focuses on a scenario where agents collaborate towards a common task (i.e., optimizing a global cost equal to aggregated local costs) while effectively having distinct individual tasks (i.e., optimizing individual local parameters in a local cost). Each agent's actions can potentially impact other agents' performance through interactions. Notably, each agent has access to only its local zeroth-order oracle (i.e., cost function value) and shares scalar values, rather than gradient vectors, with other agents, leading to communication bandwidth efficiency and agent privacy. Agents employ zeroth-order optimization to update their parameters, and the asynchronous message-passing between them is subject to bounded but possibly random communication delays. This paper presents theoretical convergence analyses and establishes a convergence rate for nonconvex problems. Furthermore, it addresses the relevant use-case of deep learning-based resource allocation in communication networks and conducts numerical experiments in which agents, acting as transmitters, collaboratively train their individual policies to maximize a global reward, e.g., a sum of data rates.

94.8OCMay 12
Constrained Stochastic Spectral Preconditioning Converges for Nonconvex Objectives

Konstantinos Oikonomidis, Jan Quan, Kimon Antonakopoulos et al.

In this work, we develop proximal preconditioned gradient methods with a focus on spectral gradient methods providing a proximal extension to the Muon and Scion optimizers. We introduce a family of stochastic algorithms that can handle a wide variety of convex and nonconvex constraints and study its convergence under heavy-tailed noise, through a novel analysis tailored to the geometry of the proposed methods. We further propose a variance-reduced version, which achieves faster convergence under standard noise assumptions. Finally, we show that the polynomial iterations used in Muon are more accurately captured by a nonlinear preconditioner than by the ideal matrix sign, leading to a convergence analysis that more faithfully reflects practical implementations.

SPNov 8, 2023
A Deep Learning Based Resource Allocator for Communication Networks with Dynamic User Utility Demands

Pourya Behmandpoor, Mark Eisen, Panagiotis Patrinos et al.

Deep learning (DL) based resource allocation (RA) has recently gained significant attention due to its performance efficiency. However, most related studies assume an ideal case where the number of users and their utility demands, e.g., data rate constraints, are fixed, and the designed DL-based RA scheme exploits a policy trained only for these fixed parameters. Consequently, computationally complex policy retraining is required whenever these parameters change. In this paper, we introduce a DL-based resource allocator (ALCOR) that allows users to adjust their utility demands freely, such as based on their application layer requirements. ALCOR employs deep neural networks (DNNs) as the policy in a time-sharing problem. The underlying optimization algorithm iteratively optimizes the on-off status of users to satisfy their utility demands in expectation. The policy performs unconstrained RA (URA) -- RA without considering user utility demands -- among active users to maximize the sum utility (SU) at each time instant. Depending on the chosen URA scheme, ALCOR can perform RA in either a centralized or distributed scenario. The derived convergence analyses provide theoretical guarantees for ALCOR's convergence, and numerical experiments corroborate its effectiveness compared to meta-learning and reinforcement learning approaches.

OCFeb 9, 2024
Adaptive proximal gradient methods are universal without approximation

Konstantinos A. Oikonomidis, Emanuel Laude, Puya Latafat et al.

We show that adaptive proximal gradient methods for convex problems are not restricted to traditional Lipschitzian assumptions. Our analysis reveals that a class of linesearch-free methods is still convergent under mere local Hölder gradient continuity, covering in particular continuously differentiable semi-algebraic functions. To mitigate the lack of local Lipschitz continuity, popular approaches revolve around $\varepsilon$-oracles and/or linesearch procedures. In contrast, we exploit plain Hölder inequalities not entailing any approximation, all while retaining the linesearch-free nature of adaptive schemes. Furthermore, we prove full sequence convergence without prior knowledge of local Hölder constants nor of the order of Hölder continuity. Numerical experiments make comparisons with baseline methods on diverse tasks from machine learning covering both the locally and the globally Hölder setting.

OCDec 11, 2023
Convergence of the Chambolle-Pock Algorithm in the Absence of Monotonicity

Brecht Evens, Puya Latafat, Panagiotis Patrinos

The Chambolle-Pock algorithm (CPA), also known as the primal-dual hybrid gradient method, has gained popularity over the last decade due to its success in solving large-scale convex structured problems. This work extends its convergence analysis for problems with varying degrees of (non)monotonicity, quantified through a so-called oblique weak Minty condition on the associated primal-dual operator. Our results reveal novel stepsize and relaxation parameter ranges which do not only depend on the norm of the linear mapping, but also on its other singular values. In particular, in nonmonotone settings, in addition to the classical stepsize conditions, extra bounds on the stepsizes and relaxation parameters are required. On the other hand, in the strongly monotone setting, the relaxation parameter is allowed to exceed the classical upper bound of two. Moreover, we build upon the recently introduced class of semimonotone operators, providing sufficient convergence conditions for CPA when the individual operators are semimonotone. Since this class of operators encompasses traditional operator classes including (hypo)- and co(hypo)-monotone operators, this analysis recovers and extends existing results for CPA. Tightness of the proposed stepsize ranges is demonstrated through several examples.

OCOct 26, 2024
The inexact power augmented Lagrangian method for constrained nonconvex optimization

Alexander Bodard, Konstantinos Oikonomidis, Emanuel Laude et al.

This work introduces an unconventional inexact augmented Lagrangian method, where the augmenting term is a Euclidean norm raised to a power between one and two. The proposed algorithm is applicable to a broad class of constrained nonconvex minimization problems, that involve nonlinear equality constraints over a convex set under a mild regularity condition. First, we conduct a full complexity analysis of the method, leveraging an accelerated first-order algorithm for solving the Hölder-smooth subproblems. Next, we present an inexact proximal point method to tackle these subproblems, demonstrating that it achieves an improved convergence rate. Notably, this rate reduces to the best-known convergence rate for first-order methods when the augmenting term is a squared Euclidean norm. Our worst-case complexity results further show that using lower powers for the augmenting term leads to faster constraint satisfaction, albeit with a slower decrease in the dual residual. Numerical experiments support our theoretical findings, illustrating that this trade-off between constraint satisfaction and cost minimization is advantageous for certain practical problems.

LGOct 20, 2025
Rethinking PCA Through Duality

Jan Quan, Johan Suykens, Panagiotis Patrinos

Motivated by the recently shown connection between self-attention and (kernel) principal component analysis (PCA), we revisit the fundamentals of PCA. Using the difference-of-convex (DC) framework, we present several novel formulations and provide new theoretical insights. In particular, we show the kernelizability and out-of-sample applicability for a PCA-like family of problems. Moreover, we uncover that simultaneous iteration, which is connected to the classical QR algorithm, is an instance of the difference-of-convex algorithm (DCA), offering an optimization perspective on this longstanding method. Further, we describe new algorithms for PCA and empirically compare them with state-of-the-art methods. Lastly, we introduce a kernelizable dual formulation for a robust variant of PCA that minimizes the $l_1$ deviation of the reconstruction errors.

OCApr 1, 2025
Spingarn's Method and Progressive Decoupling Beyond Elicitable Monotonicity

Brecht Evens, Puya Latafat, Panagiotis Patrinos

Spingarn's method of partial inverses and the progressive decoupling algorithm address inclusion problems involving the sum of an operator and the normal cone of a linear subspace, known as linkage problems. Despite their success, existing convergence results are limited to the so-called elicitable monotone setting, where nonmonotonicity is allowed only on the orthogonal complement of the linkage subspace. In this paper, we introduce progressive decoupling+, a generalized version of standard progressive decoupling that incorporates separate relaxation parameters for the linkage subspace and its orthogonal complement. We prove convergence under conditions that link the relaxation parameters to the nonmonotonicity of their respective subspaces and show that the special cases of Spingarn's method and standard progressive decoupling also extend beyond the elicitable monotone setting. Our analysis hinges upon an equivalence between progressive decoupling+ and the preconditioned proximal point algorithm, for which we develop a general local convergence analysis in a certain nonmonotone setting.

LGNov 12, 2024
Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach

Renzi Wang, Flavia Sofia Acerbo, Tong Duy Son et al.

This paper presents a novel approach to imitation learning from observations, where an autoregressive mixture of experts model is deployed to fit the underlying policy. The parameters of the model are learned via a two-stage framework. By leveraging the existing dynamics knowledge, the first stage of the framework estimates the control input sequences and hence reduces the problem complexity. At the second stage, the policy is learned by solving a regularized maximum-likelihood estimation problem using the estimated control input sequences. We further extend the learning procedure by incorporating a Lyapunov stability constraint to ensure asymptotic stability of the identified model, for accurate multi-step predictions. The effectiveness of the proposed framework is validated using two autonomous driving datasets collected from human demonstrations, demonstrating its practical applicability in modelling complex nonlinear dynamics.

LGJun 13, 2024
Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method

Qinghua Tao, Francesco Tonin, Alex Lambert et al.

In contrast with Mercer kernel-based approaches as used e.g., in Kernel Principal Component Analysis (KPCA), it was previously shown that Singular Value Decomposition (SVD) inherently relates to asymmetric kernels and Asymmetric Kernel Singular Value Decomposition (KSVD) has been proposed. However, the existing formulation to KSVD cannot work with infinite-dimensional feature mappings, the variational objective can be unbounded, and needs further numerical evaluation and exploration towards machine learning. In this work, i) we introduce a new asymmetric learning paradigm based on coupled covariance eigenproblem (CCE) through covariance operators, allowing infinite-dimensional feature maps. The solution to CCE is ultimately obtained from the SVD of the induced asymmetric kernel matrix, providing links to KSVD. ii) Starting from the integral equations corresponding to a pair of coupled adjoint eigenfunctions, we formalize the asymmetric Nyström method through a finite sample approximation to speed up training. iii) We provide the first empirical evaluations verifying the practical utility and benefits of KSVD and compare with methods resorting to symmetrization or linear SVD across multiple tasks.

OCJul 9, 2021
Block Alternating Bregman Majorization Minimization with Extrapolation

Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis et al.

In this paper, we consider a class of nonsmooth nonconvex optimization problems whose objective is the sum of a block relative smooth function and a proper and lower semicontinuous block separable function. Although the analysis of block proximal gradient (BPG) methods for the class of block $L$-smooth functions have been successfully extended to Bregman BPG methods that deal with the class of block relative smooth functions, accelerated Bregman BPG methods are scarce and challenging to design. Taking our inspiration from Nesterov-type acceleration and the majorization-minimization scheme, we propose a block alternating Bregman Majorization-Minimization framework with Extrapolation (BMME). We prove subsequential convergence of BMME to a first-order stationary point under mild assumptions, and study its global convergence under stronger conditions. We illustrate the effectiveness of BMME on the penalized orthogonal nonnegative matrix factorization problem.

OCMar 15, 2021
Lasry-Lions Envelopes and Nonconvex Optimization: A Homotopy Approach

Miguel Simões, Andreas Themelis, Panagiotis Patrinos

In large-scale optimization, the presence of nonsmooth and nonconvex terms in a given problem typically makes it hard to solve. A popular approach to address nonsmooth terms in convex optimization is to approximate them with their respective Moreau envelopes. In this work, we study the use of Lasry-Lions double envelopes to approximate nonsmooth terms that are also not convex. These envelopes are an extension of the Moreau ones but exhibit an additional smoothness property that makes them amenable to fast optimization algorithms. Lasry-Lions envelopes can also be seen as an "intermediate" between a given function and its convex envelope, and we make use of this property to develop a method that builds a sequence of approximate subproblems that are easier to solve than the original problem. We discuss convergence properties of this method when used to address composite minimization problems; additionally, based on a number of experiments, we discuss settings where it may be more useful than classical alternatives in two domains: signal decoding and spectral unmixing.

LGFeb 16, 2021
Unsupervised Energy-based Out-of-distribution Detection using Stiefel-Restricted Kernel Machine

Francesco Tonin, Arun Pandey, Panagiotis Patrinos et al.

Detecting out-of-distribution (OOD) samples is an essential requirement for the deployment of machine learning systems in the real world. Until now, research on energy-based OOD detectors has focused on the softmax confidence score from a pre-trained neural network classifier with access to class labels. In contrast, we propose an unsupervised energy-based OOD detector leveraging the Stiefel-Restricted Kernel Machine (St-RKM). Training requires minimizing an objective function with an autoencoder loss term and the RKM energy where the interconnection matrix lies on the Stiefel manifold. Further, we outline multiple energy function definitions based on the RKM framework and discuss their utility. In the experiments on standard datasets, the proposed method improves over the existing energy-based OOD detectors and deep generative models. Through several ablation studies, we further illustrate the merit of each proposed energy function on the OOD detection performance.

LGNov 25, 2020
Unsupervised learning of disentangled representations in deep restricted kernel machines with orthogonality constraints

Francesco Tonin, Panagiotis Patrinos, Johan A. K. Suykens

We introduce Constr-DRKM, a deep kernel method for the unsupervised learning of disentangled data representations. We propose augmenting the original deep restricted kernel machine formulation for kernel PCA by orthogonality constraints on the latent variables to promote disentanglement and to make it possible to carry out optimization without first defining a stabilized objective. After illustrating an end-to-end training procedure based on a quadratic penalty optimization algorithm with warm start, we quantitatively evaluate the proposed method's effectiveness in disentangled feature learning. We demonstrate on four benchmark datasets that this approach performs similarly overall to $β$-VAE on a number of disentanglement metrics when few training points are available, while being less sensitive to randomness and hyperparameter selection than $β$-VAE. We also present a deterministic initialization of Constr-DRKM's training algorithm that significantly improves the reproducibility of the results. Finally, we empirically evaluate and discuss the role of the number of layers in the proposed methodology, examining the influence of each principal component in every layer and showing that components in lower layers act as local feature detectors capturing the broad trends of the data distribution, while components in deeper layers use the representation learned by previous layers and more accurately reproduce higher-level features.

OCMar 5, 2019
Inertial Block Proximal Methods for Non-Convex Non-Smooth Optimization

Le Thi Khanh Hien, Nicolas Gillis, Panagiotis Patrinos

We propose inertial versions of block coordinate descent methods for solving non-convex non-smooth composite optimization problems. Our methods possess three main advantages compared to current state-of-the-art accelerated first-order methods: (1) they allow using two different extrapolation points to evaluate the gradients and to add the inertial force (we will empirically show that it is more efficient than using a single extrapolation point), (2) they allow to randomly picking the block of variables to update, and (3) they do not require a restarting step. We prove the subsequential convergence of the generated sequence under mild assumptions, prove the global convergence under some additional assumptions, and provide convergence rates. We deploy the proposed methods to solve non-negative matrix factorization (NMF) and show that they compete favorably with the state-of-the-art NMF algorithms. Additional experiments on non-negative approximate canonical polyadic decomposition, also known as non-negative tensor factorization, are also provided.