Pavol Harar

SD
7papers
213citations
Novelty38%
AI Score26

7 Papers

LGApr 4, 2023Code
FakET: Simulating Cryo-Electron Tomograms with Neural Style Transfer

Pavol Harar, Lukas Herrmann, Philipp Grohs et al.

In cryo-electron microscopy, accurate particle localization and classification are imperative. Recent deep learning solutions, though successful, require extensive training data sets. The protracted generation time of physics-based models, often employed to produce these data sets, limits their broad applicability. We introduce FakET, a method based on Neural Style Transfer, capable of simulating the forward operator of any cryo transmission electron microscope. It can be used to adapt a synthetic training data set according to reference data producing high-quality simulated micrographs or tilt-series. To assess the quality of our generated data, we used it to train a state-of-the-art localization and classification architecture and compared its performance with a counterpart trained on benchmark data. Remarkably, our technique matches the performance, boosts data generation speed 750 times, uses 33 times less memory, and scales well to typical transmission electron microscope detector sizes. It leverages GPU acceleration and parallel processing. The source code is available at https://github.com/paloha/faket.

CVOct 25, 2022Code
Redistributor: Transforming Empirical Data Distributions

Pavol Harar, Dennis Elbrächter, Monika Dörfler et al.

We present an algorithm and package, Redistributor, which forces a collection of scalar samples to follow a desired distribution. When given independent and identically distributed samples of some random variable $S$ and the continuous cumulative distribution function of some desired target $T$, it provably produces a consistent estimator of the transformation $R$ which satisfies $R(S)=T$ in distribution. As the distribution of $S$ or $T$ may be unknown, we also include algorithms for efficiently estimating these distributions from samples. This allows for various interesting use cases in image processing, where Redistributor serves as a remarkably simple and easy-to-use tool that is capable of producing visually appealing results. For color correction it outperforms other model-based methods and excels in achieving photorealistic style transfer, surpassing deep learning methods in content preservation. The package is implemented in Python and is optimized to efficiently handle large datasets, making it also suitable as a preprocessing step in machine learning. The source code is available at https://github.com/paloha/redistributor.

SDJul 13, 2019
Towards Robust Voice Pathology Detection

Pavol Harar, Zoltan Galaz, Jesus B. Alonso-Hernandez et al.

Automatic objective non-invasive detection of pathological voice based on computerized analysis of acoustic signals can play an important role in early diagnosis, progression tracking and even effective treatment of pathological voices. In search towards such a robust voice pathology detection system we investigated 3 distinct classifiers within supervised learning and anomaly detection paradigms. We conducted a set of experiments using a variety of input data such as raw waveforms, spectrograms, mel-frequency cepstral coefficients (MFCC) and conventional acoustic (dysphonic) features (AF). In comparison with previously published works, this article is the first to utilize combination of 4 different databases comprising normophonic and pathological recordings of sustained phonation of the vowel /a/ unrestricted to a subset of vocal pathologies. Furthermore, to our best knowledge, this article is the first to explore gradient boosted trees and deep learning for this application. The following best classification performances measured by F1 score on dedicated test set were achieved: XGBoost (0.733) using AF and MFCC, DenseNet (0.621) using MFCC, and Isolation Forest (0.610) using AF. Even though these results are of exploratory character, conducted experiments do show promising potential of gradient boosting and deep learning methods to robustly detect voice pathologies.

ASJul 12, 2019
Voice Pathology Detection Using Deep Learning: a Preliminary Study

Pavol Harar, Jesus B. Alonso-Hernandez, Jiri Mekyska et al.

This paper describes a preliminary investigation of Voice Pathology Detection using Deep Neural Networks (DNN). We used voice recordings of sustained vowel /a/ produced at normal pitch from German corpus Saarbruecken Voice Database (SVD). This corpus contains voice recordings and electroglottograph signals of more than 2 000 speakers. The idea behind this experiment is the use of convolutional layers in combination with recurrent Long-Short-Term-Memory (LSTM) layers on raw audio signal. Each recording was split into 64 ms Hamming windowed segments with 30 ms overlap. Our trained model achieved 71.36% accuracy with 65.04% sensitivity and 77.67% specificity on 206 validation files and 68.08% accuracy with 66.75% sensitivity and 77.89% specificity on 874 testing files. This is a promising result in favor of this approach because it is comparable to similar previously published experiment that used different methodology. Further investigation is needed to achieve the state-of-the-art results.

SDMar 21, 2019
Improving Machine Hearing on Limited Data Sets

Pavol Harar, Roswitha Bammer, Anna Breger et al.

Convolutional neural network (CNN) architectures have originated and revolutionized machine learning for images. In order to take advantage of CNNs in predictive modeling with audio data, standard FFT-based signal processing methods are often applied to convert the raw audio waveforms into an image-like representations (e.g. spectrograms). Even though conventional images and spectrograms differ in their feature properties, this kind of pre-processing reduces the amount of training data necessary for successful training. In this contribution we investigate how input and target representations interplay with the amount of available training data in a music information retrieval setting. We compare the standard mel-spectrogram inputs with a newly proposed representation, called Mel scattering. Furthermore, we investigate the impact of additional target data representations by using an augmented target loss function which incorporates unused available information. We observe that all proposed methods outperform the standard mel-transform representation when using a limited data set and discuss their strengths and limitations. The source code for reproducibility of our experiments as well as intermediate results and model checkpoints are available in an online repository.

NAJan 22, 2019
On orthogonal projections for dimension reduction and applications in augmented target loss functions for learning problems

Anna Breger, Jose Ignacio Orlando, Pavol Harar et al.

The use of orthogonal projections on high-dimensional input and target data in learning frameworks is studied. First, we investigate the relations between two standard objectives in dimension reduction, preservation of variance and of pairwise relative distances. Investigations of their asymptotic correlation as well as numerical experiments show that a projection does usually not satisfy both objectives at once. In a standard classification problem we determine projections on the input data that balance the objectives and compare subsequent results. Next, we extend our application of orthogonal projections to deep learning tasks and introduce a general framework of augmented target loss functions. These loss functions integrate additional information via transformations and projections of the target data. In two supervised learning problems, clinical image segmentation and music information classification, the application of our proposed augmented target loss functions increase the accuracy.

SDJun 27, 2017
Gabor frames and deep scattering networks in audio processing

Roswitha Bammer, Monika Dörfler, Pavol Harar

This paper introduces Gabor scattering, a feature extractor based on Gabor frames and Mallat's scattering transform. By using a simple signal model for audio signals specific properties of Gabor scattering are studied. It is shown that for each layer, specific invariances to certain signal characteristics occur. Furthermore, deformation stability of the coefficient vector generated by the feature extractor is derived by using a decoupling technique which exploits the contractivity of general scattering networks. Deformations are introduced as changes in spectral shape and frequency modulation. The theoretical results are illustrated by numerical examples and experiments. Numerical evidence is given by evaluation on a synthetic and a "real" data set, that the invariances encoded by the Gabor scattering transform lead to higher performance in comparison with just using Gabor transform, especially when few training samples are available.