Yannis Pantazis

LG
13papers
148citations
Novelty53%
AI Score26

13 Papers

MLOct 10, 2022
Function-space regularized Rényi divergences

Jeremiah Birrell, Yannis Pantazis, Paul Dupuis et al.

We propose a new family of regularized Rényi divergences parametrized not only by the order $α$ but also by a variational function space. These new objects are defined by taking the infimal convolution of the standard Rényi divergence with the integral probability metric (IPM) associated with the chosen function space. We derive a novel dual variational representation that can be used to construct numerically tractable divergence estimators. This representation avoids risk-sensitive terms and therefore exhibits lower variance, making it well-behaved when $α>1$; this addresses a notable weakness of prior approaches. We prove several properties of these new divergences, showing that they interpolate between the classical Rényi divergences and IPMs. We also study the $α\to\infty$ limit, which leads to a regularized worst-case-regret and a new variational representation in the classical case. Moreover, we show that the proposed regularized Rényi divergences inherit features from IPMs such as the ability to compare distributions that are not absolutely continuous, e.g., empirical measures and distributions with low-dimensional support. We present numerical results on both synthetic and real datasets, showing the utility of these new divergences in both estimation and GAN training applications; in particular, we demonstrate significantly reduced variance and improved training performance.

NAApr 15, 2013
Measuring the Irreversibility of Numerical Schemes for Reversible Stochastic Differential Equations

Markos Katsoulakis, Yannis Pantazis, Luc Rey-Bellet

For a Markov process the detailed balance condition is equivalent to the time-reversibility of the process. For stochastic differential equations (SDE's) time discretization numerical schemes usually destroy the property of time-reversibility. Despite an extensive literature on the numerical analysis for SDE's, their stability properties, strong and/or weak error estimates, large deviations and infinite-time estimates, no quantitative results are known on the lack of reversibility of the discrete-time approximation process. In this paper we provide such quantitative estimates by using the concept of entropy production rate, inspired by ideas from non-equilibrium statistical mechanics. The entropy production rate for a stochastic process is defined as the relative entropy (per unit time) of the path measure of the process with respect to the path measure of the time-reversed process. By construction the entropy production rate is nonnegative and it vanishes if and only if the process is reversible. Crucially, from a numerical point of view, the entropy production rate is an {\em a posteriori} quantity, hence it can be computed in the course of a simulation as the ergodic average of a certain functional of the process (the so-called Gallavotti-Cohen (GC) action functional). We compute the entropy production for various numerical schemes such as explicit Euler-Maruyama and explicit Milstein's for reversible SDEs with additive or multiplicative noise. Additionally, we analyze the entropy production for the BBK integrator of the Langevin processes. We show that entropy production is an observable that distinguishes between different numerical schemes in terms of their discretization-induced irreversibility. Furthermore, our results show that the type of the noise critically affects the behavior of the entropy production rate.

MLOct 31, 2022
Lipschitz-regularized gradient flows and generative particle algorithms for high-dimensional scarce data

Hyemin Gu, Panagiota Birmpa, Yannis Pantazis et al.

We build a new class of generative algorithms capable of efficiently learning an arbitrary target distribution from possibly scarce, high-dimensional data and subsequently generate new samples. These generative algorithms are particle-based and are constructed as gradient flows of Lipschitz-regularized Kullback-Leibler or other $f$-divergences, where data from a source distribution can be stably transported as particles, towards the vicinity of the target distribution. As a highlighted result in data integration, we demonstrate that the proposed algorithms correctly transport gene expression data points with dimension exceeding 54K, while the sample size is typically only in the hundreds.

PRFeb 18, 2015
Pathwise Sensitivity Analysis in Transient Regimes

Georgios Arampatzis, Markos A. Katsoulakis, Yannis Pantazis

The instantaneous relative entropy (IRE) and the corresponding instanta- neous Fisher information matrix (IFIM) for transient stochastic processes are pre- sented in this paper. These novel tools for sensitivity analysis of stochastic models serve as an extension of the well known relative entropy rate (RER) and the corre- sponding Fisher information matrix (FIM) that apply to stationary processes. Three cases are studied here, discrete-time Markov chains, continuous-time Markov chains and stochastic differential equations. A biological reaction network is presented as a demonstration numerical example.

GTJun 7, 2021
Forward Looking Best-Response Multiplicative Weights Update Methods for Bilinear Zero-sum Games

Michail Fasoulakis, Evangelos Markakis, Yannis Pantazis et al.

Our work focuses on extra gradient learning algorithms for finding Nash equilibria in bilinear zero-sum games. The proposed method, which can be formally considered as a variant of Optimistic Mirror Descent \cite{DBLP:conf/iclr/MertikopoulosLZ19}, uses a large learning rate for the intermediate gradient step which essentially leads to computing (approximate) best response strategies against the profile of the previous iteration. Although counter-intuitive at first sight due to the irrationally large, for an iterative algorithm, intermediate learning step, we prove that the method guarantees last-iterate convergence to an equilibrium. Particularly, we show that the algorithm reaches first an $η^{1/ρ}$-approximate Nash equilibrium, with $ρ> 1$, by decreasing the Kullback-Leibler divergence of each iterate by at least $Ω(η^{1+\frac{1}ρ})$, for sufficiently small learning rate, $η$, until the method becomes a contracting map, and converges to the exact equilibrium. Furthermore, we perform experimental comparisons with the optimistic variant of the multiplicative weights update method, by \cite{Daskalakis2019LastIterateCZ} and show that our algorithm has significant practical potential since it offers substantial gains in terms of accelerated convergence.

LGDec 9, 2020
Inference of Stochastic Dynamical Systems from Cross-Sectional Population Data

Anastasios Tsourtis, Yannis Pantazis, Ioannis Tsamardinos

Inferring the driving equations of a dynamical system from population or time-course data is important in several scientific fields such as biochemistry, epidemiology, financial mathematics and many others. Despite the existence of algorithms that learn the dynamics from trajectorial measurements there are few attempts to infer the dynamical system straight from population data. In this work, we deduce and then computationally estimate the Fokker-Planck equation which describes the evolution of the population's probability density, based on stochastic differential equations. Then, following the USDL approach, we project the Fokker-Planck equation to a proper set of test functions, transforming it into a linear system of equations. Finally, we apply sparse inference methods to solve the latter system and thus induce the driving forces of the dynamical system. Our approach is illustrated in both synthetic and real data including non-linear, multimodal stochastic differential equations, biochemical reaction networks as well as mass cytometry biological measurements.

MLNov 11, 2020
$(f,Γ)$-Divergences: Interpolating between $f$-Divergences and Integral Probability Metrics

Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis et al.

We develop a rigorous and general framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs), such as the $1$-Wasserstein distance. We prove under which assumptions these divergences, hereafter referred to as $(f,Γ)$-divergences, provide a notion of `distance' between probability measures and show that they can be expressed as a two-stage mass-redistribution/mass-transport process. The $(f,Γ)$-divergences inherit features from IPMs, such as the ability to compare distributions which are not absolutely continuous, as well as from $f$-divergences, namely the strict concavity of their variational representations and the ability to control heavy-tailed distributions for particular choices of $f$. When combined, these features establish a divergence with improved properties for estimation, statistical learning, and uncertainty quantification applications. Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions. We also show improved performance and stability over gradient-penalized Wasserstein GAN in image generation.

SDAug 13, 2020
Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion

Dipjyoti Paul, Muhammed PV Shifas, Yannis Pantazis et al.

The increased adoption of digital assistants makes text-to-speech (TTS) synthesis systems an indispensable feature of modern mobile devices. It is hence desirable to build a system capable of generating highly intelligible speech in the presence of noise. Past studies have investigated style conversion in TTS synthesis, yet degraded synthesized quality often leads to worse intelligibility. To overcome such limitations, we proposed a novel transfer learning approach using Tacotron and WaveRNN based TTS synthesis. The proposed speech system exploits two modification strategies: (a) Lombard speaking style data and (b) Spectral Shaping and Dynamic Range Compression (SSDRC) which has been shown to provide high intelligibility gains by redistributing the signal energy on the time-frequency domain. We refer to this extension as Lombard-SSDRC TTS system. Intelligibility enhancement as quantified by the Intelligibility in Bits (SIIB-Gauss) measure shows that the proposed Lombard-SSDRC TTS system shows significant relative improvement between 110% and 130% in speech-shaped noise (SSN), and 47% to 140% in competing-speaker noise (CSN) against the state-of-the-art TTS approach. Additional subjective evaluation shows that Lombard-SSDRC TTS successfully increases the speech intelligibility with relative improvement of 455% for SSN and 104% for CSN in median keyword correction rate compared to the baseline TTS method.

ASAug 9, 2020
Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions

Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou

Recent advancements in deep learning led to human-level performance in single-speaker speech synthesis. However, there are still limitations in terms of speech quality when generalizing those systems into multiple-speaker models especially for unseen speakers and unseen recording qualities. For instance, conventional neural vocoders are adjusted to the training speaker and have poor generalization capabilities to unseen speakers. In this work, we propose a variant of WaveRNN, referred to as speaker conditional WaveRNN (SC-WaveRNN). We target towards the development of an efficient universal vocoder even for unseen speakers and recording conditions. In contrast to standard WaveRNN, SC-WaveRNN exploits additional information given in the form of speaker embeddings. Using publicly-available data for training, SC-WaveRNN achieves significantly better performance over baseline WaveRNN on both subjective and objective metrics. In MOS, SC-WaveRNN achieves an improvement of about 23% for seen speaker and seen recording condition and up to 95% for unseen speaker and unseen condition. Finally, we extend our work by implementing a multi-speaker text-to-speech (TTS) synthesis similar to zero-shot speaker adaptation. In terms of performance, our system has been preferred over the baseline TTS system by 60% over 15.5% and by 60.9% over 32.6%, for seen and unseen speakers, respectively.

LGJun 15, 2020
Optimizing Variational Representations of Divergences and Accelerating their Statistical Estimation

Jeremiah Birrell, Markos A. Katsoulakis, Yannis Pantazis

Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving the tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is currently lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils the local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.

COMP-PHJun 13, 2020
Predictive modeling approaches in laser-based material processing

Maria Christina Velli, George D. Tsibidis, Alexandros Mimidis et al.

Predictive modelling represents an emerging field that combines existing and novel methodologies aimed to rapidly understand physical mechanisms and concurrently develop new materials, processes and structures. In the current study, previously-unexplored predictive modelling in a key-enabled technology, the laser-based manufacturing, aims to automate and forecast the effect of laser processing on material structures. The focus is centred on the performance of representative statistical and machine learning algorithms in predicting the outcome of laser processing on a range of materials. Results on experimental data showed that predictive models were able to satisfactorily learn the mapping between the laser input variables and the observed material structure. These results are further integrated with simulation data aiming to elucidate the multiscale physical processes upon laser-material interaction. As a consequence, we augmented the adjusted simulated data to the experimental and substantially improved the predictive performance, due to the availability of increased number of sampling points. In parallel, a metric to identify and quantify the regions with high predictive uncertainty, is presented, revealing that high uncertainty occurs around the transition boundaries. Our results can set the basis for a systematic methodology towards reducing material design, testing and production cost via the replacement of expensive trial-and-error based manufacturing procedure with a precise pre-fabrication predictive tool.

LGJun 11, 2020
Cumulant GAN

Yannis Pantazis, Dipjyoti Paul, Michail Fasoulakis et al.

In this paper, we propose a novel loss function for training Generative Adversarial Networks (GANs) aiming towards deeper theoretical understanding as well as improved stability and performance for the underlying optimization problem. The new loss function is based on cumulant generating functions giving rise to \emph{Cumulant GAN}. Relying on a recently-derived variational formula, we show that the corresponding optimization problem is equivalent to R{é}nyi divergence minimization, thus offering a (partially) unified perspective of GAN losses: the R{é}nyi family encompasses Kullback-Leibler divergence (KLD), reverse KLD, Hellinger distance and $χ^2$-divergence. Wasserstein GAN is also a member of cumulant GAN. In terms of stability, we rigorously prove the linear convergence of cumulant GAN to the Nash equilibrium for a linear discriminator, Gaussian distributions and the standard gradient descent ascent algorithm. Finally, we experimentally demonstrate that image generation is more robust relative to Wasserstein GAN and it is substantially improved in terms of both inception score and Fréchet inception distance when both weaker and stronger discriminators are considered.

LGNov 6, 2018
Training Generative Adversarial Networks with Weights

Yannis Pantazis, Dipjyoti Paul, Michail Fasoulakis et al.

The impressive success of Generative Adversarial Networks (GANs) is often overshadowed by the difficulties in their training. Despite the continuous efforts and improvements, there are still open issues regarding their convergence properties. In this paper, we propose a simple training variation where suitable weights are defined and assist the training of the Generator. We provide theoretical arguments why the proposed algorithm is better than the baseline training in the sense of speeding up the training process and of creating a stronger Generator. Performance results showed that the new algorithm is more accurate in both synthetic and image datasets resulting in improvements ranging between 5% and 50%.