Mehmet Süzen

LG
5papers
50citations
Novelty46%
AI Score22

5 Papers

LGJun 14, 2020
Equivalence in Deep Neural Networks via Conjugate Matrix Ensembles

Mehmet Süzen

A numerical approach is developed for detecting the equivalence of deep learning architectures. The method is based on generating Mixed Matrix Ensembles (MMEs) out of deep neural network weight matrices and {\it conjugate circular ensemble} matching the neural architecture topology. Following this, the empirical evidence supports the {\it phenomenon} that difference between spectral densities of neural architectures and corresponding {\it conjugate circular ensemble} are vanishing with different decay rates at the long positive tail part of the spectrum i.e., cumulative Circular Spectral Difference (CSD). This finding can be used in establishing equivalences among different neural architectures via analysis of fluctuations in CSD. We investigated this phenomenon for a wide range of deep learning vision architectures and with circular ensembles originating from statistical quantum mechanics. Practical implications of the proposed method for artificial and natural neural architectures discussed such as the possibility of using the approach in Neural Architecture Search (NAS) and classification of biological neural networks.

LGNov 10, 2019
Periodic Spectral Ergodicity: A Complexity Measure for Deep Neural Networks and Neural Architecture Search

Mehmet Süzen, J. J. Cerdà, Cornelius Weber

Establishing associations between the structure and the generalisation ability of deep neural networks (DNNs) is a challenging task in modern machine learning. Producing solutions to this challenge will bring progress both in the theoretical understanding of DNNs and in building new architectures efficiently. In this work, we address this challenge by developing a new complexity measure based on the concept of {Periodic Spectral Ergodicity} (PSE) originating from quantum statistical mechanics. Based on this measure a technique is devised to quantify the complexity of deep neural networks from the learned weights and traversing the network connectivity in a sequential manner, hence the term cascading PSE (cPSE), as an empirical complexity measure. This measure will capture both topological and internal neural processing complexity simultaneously. Because of this cascading approach, i.e., a symmetric divergence of PSE on the consecutive layers, it is possible to use this measure for Neural Architecture Search (NAS). We demonstrate the usefulness of this measure in practice on two sets of vision models, ResNet and VGG, and sketch the computation of cPSE for more complex network structures.

MLOct 21, 2019
Generalised learning of time-series: Ornstein-Uhlenbeck processes

Mehmet Süzen, Alper Yegenoglu

In machine learning, statistics, econometrics and statistical physics, cross-validation (CV) is used asa standard approach in quantifying the generalisation performance of a statistical model. A directapplication of CV in time-series leads to the loss of serial correlations, a requirement of preserving anynon-stationarity and the prediction of the past data using the future data. In this work, we proposea meta-algorithm called reconstructive cross validation (rCV ) that avoids all these issues. At first,k folds are formed with non-overlapping randomly selected subsets of the original time-series. Then,we generate k new partial time-series by removing data points from a given fold: every new partialtime-series have missing points at random from a different entire fold. A suitable imputation or asmoothing technique is used to reconstruct k time-series. We call these reconstructions secondarymodels. Thereafter, we build the primary k time-series models using new time-series coming fromthe secondary models. The performance of the primary models are evaluated simultaneously bycomputing the deviations from the originally removed data points and out-of-sample (OSS) data.Full cross-validation in time-series models can be practiced with rCV along with generating learning curves.

LGApr 16, 2019
HARK Side of Deep Learning -- From Grad Student Descent to Automated Machine Learning

Oguzhan Gencoglu, Mark van Gils, Esin Guldogan et al.

Recent advancements in machine learning research, i.e., deep learning, introduced methods that excel conventional algorithms as well as humans in several complex tasks, ranging from detection of objects in images and speech recognition to playing difficult strategic games. However, the current methodology of machine learning research and consequently, implementations of the real-world applications of such algorithms, seems to have a recurring HARKing (Hypothesizing After the Results are Known) issue. In this work, we elaborate on the algorithmic, economic and social reasons and consequences of this phenomenon. We present examples from current common practices of conducting machine learning research (e.g. avoidance of reporting negative results) and failure of generalization ability of the proposed algorithms and datasets in actual real-life usage. Furthermore, a potential future trajectory of machine learning research and development from the perspective of accountable, unbiased, ethical and privacy-aware algorithmic decision making is discussed. We would like to emphasize that with this discussion we neither claim to provide an exhaustive argumentation nor blame any specific institution or individual on the raised issues. This is simply a discussion put forth by us, insiders of the machine learning field, reflecting on us.

MLApr 25, 2017
Spectral Ergodicity in Deep Learning Architectures via Surrogate Random Matrices

Mehmet Süzen, Cornelius Weber, Joan J. Cerdà

In this work a novel method to quantify spectral ergodicity for random matrices is presented. The new methodology combines approaches rooted in the metrics of Thirumalai-Mountain (TM) and Kullbach-Leibler (KL) divergence. The method is applied to a general study of deep and recurrent neural networks via the analysis of random matrix ensembles mimicking typical weight matrices of those systems. In particular, we examine circular random matrix ensembles: circular unitary ensemble (CUE), circular orthogonal ensemble (COE), and circular symplectic ensemble (CSE). Eigenvalue spectra and spectral ergodicity are computed for those ensembles as a function of network size. It is observed that as the matrix size increases the level of spectral ergodicity of the ensemble rises, i.e., the eigenvalue spectra obtained for a single realisation at random from the ensemble is closer to the spectra obtained averaging over the whole ensemble. Based on previous results we conjecture that success of deep learning architectures is strongly bound to the concept of spectral ergodicity. The method to compute spectral ergodicity proposed in this work could be used to optimise the size and architecture of deep as well as recurrent neural networks.