Bálint Gyires-Tóth

h-index8

12papers

145citations

Novelty35%

AI Score31

Ranked #129,427 of 194,257 authors (top 67%)#42,808 in CV (top 72%)

12 Papers

1.4CVJun 7, 2022Code

Utility of Equivariant Message Passing in Cortical Mesh Segmentation

Dániel Unyi, Ferdinando Insalata, Petar Veličković et al.

The automated segmentation of cortical areas has been a long-standing challenge in medical image analysis. The complex geometry of the cortex is commonly represented as a polygon mesh, whose segmentation can be addressed by graph-based learning methods. When cortical meshes are misaligned across subjects, current methods produce significantly worse segmentation results, limiting their ability to handle multi-domain data. In this paper, we investigate the utility of E(n)-equivariant graph neural networks (EGNNs), comparing their performance against plain graph neural networks (GNNs). Our evaluation shows that GNNs outperform EGNNs on aligned meshes, due to their ability to leverage the presence of a global coordinate system. On misaligned meshes, the performance of plain GNNs drop considerably, while E(n)-equivariant message passing maintains the same segmentation results. The best results can also be obtained by using plain GNNs on realigned data (co-registered meshes in a global coordinate system).

15.6CVSep 1, 2022Code

Self-Supervised Pretraining for 2D Medical Image Segmentation

András Kalapos, Bálint Gyires-Tóth

Supervised machine learning provides state-of-the-art solutions to a wide range of computer vision problems. However, the need for copious labelled training data limits the capabilities of these algorithms in scenarios where such input is scarce or expensive. Self-supervised learning offers a way to lower the need for manually annotated data by pretraining models for a specific domain on unlabelled data. In this approach, labelled data are solely required to fine-tune models for downstream tasks. Medical image segmentation is a field where labelling data requires expert knowledge and collecting large labelled datasets is challenging; therefore, self-supervised learning algorithms promise substantial improvements in this field. Despite this, self-supervised learning algorithms are used rarely to pretrain medical image segmentation networks. In this paper, we elaborate and analyse the effectiveness of supervised and self-supervised pretraining approaches on downstream medical image segmentation, focusing on convergence and data efficiency. We find that self-supervised pretraining on natural images and target-domain-specific images leads to the fastest and most stable downstream convergence. In our experiments on the ACDC cardiac segmentation dataset, this pretraining approach achieves 4-5 times faster fine-tuning convergence compared to an ImageNet pretrained model. We also show that this approach requires less than five epochs of pretraining on domain-specific data to achieve such improvement in the downstream convergence time. Finally, we find that, in low-data scenarios, supervised ImageNet pretraining achieves the best accuracy, requiring less than 100 annotated samples to realise close to minimal error.

8.7CVAug 14, 2024Code

Whitening Consistently Improves Self-Supervised Learning

András Kalapos, Bálint Gyires-Tóth

Self-supervised learning (SSL) has been shown to be a powerful approach for learning visual representations. In this study, we propose incorporating ZCA whitening as the final layer of the encoder in self-supervised learning to enhance the quality of learned features by normalizing and decorrelating them. Although whitening has been utilized in SSL in previous works, its potential to universally improve any SSL model has not been explored. We demonstrate that adding whitening as the last layer of SSL pretrained encoders is independent of the self-supervised learning method and encoder architecture, thus it improves performance for a wide range of SSL methods across multiple encoder architectures and datasets. Our experiments show that whitening is capable of improving linear and k-NN probing accuracy by 1-5%. Additionally, we propose metrics that allow for a comprehensive analysis of the learned features, provide insights into the quality of the representations and help identify collapse patterns.

6.5CVAug 14, 2024Code

CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture

András Kalapos, Bálint Gyires-Tóth

Self-supervised learning (SSL) has become an important approach in pretraining large neural networks, enabling unprecedented scaling of model and dataset sizes. While recent advances like I-JEPA have shown promising results for Vision Transformers, adapting such methods to Convolutional Neural Networks (CNNs) presents unique challenges. In this paper, we introduce CNN-JEPA, a novel SSL method that successfully applies the joint embedding predictive architecture approach to CNNs. Our method incorporates a sparse CNN encoder to handle masked inputs, a fully convolutional predictor using depthwise separable convolutions, and an improved masking strategy. We demonstrate that CNN-JEPA outperforms I-JEPA with ViT architectures on ImageNet-100, achieving a 73.3% linear top-1 accuracy using a standard ResNet-50 encoder. Compared to other CNN-based SSL methods, CNN-JEPA requires 17-35% less training time for the same number of epochs and approaches the linear and k-NN top-1 accuracies of BYOL, SimCLR, and VICReg. Our approach offers a simpler, more efficient alternative to existing SSL methods for CNNs, requiring minimal augmentations and no separate projector network.

1.0LGSep 26, 2019Code

Stochastic Weight Matrix-based Regularization Methods for Deep Neural Networks

Patrik Reizinger, Bálint Gyires-Tóth

The aim of this paper is to introduce two widely applicable regularization methods based on the direct modification of weight matrices. The first method, Weight Reinitialization, utilizes a simplified Bayesian assumption with partially resetting a sparse subset of the parameters. The second one, Weight Shuffling, introduces an entropy- and weight distribution-invariant non-white noise to the parameters. The latter can also be interpreted as an ensemble approach. The proposed methods are evaluated on benchmark datasets, such as MNIST, CIFAR-10 or the JSB Chorales database, and also on time series modeling tasks. We report gains both regarding performance and entropy of the analyzed networks. We also made our code available as a GitHub repository (https://github.com/rpatrik96/lod-wmm-2019).

1.2ASApr 23, 2022Code

Improving Self-Supervised Learning-based MOS Prediction Networks

Bálint Gyires-Tóth, Csaba Zainkó

MOS (Mean Opinion Score) is a subjective method used for the evaluation of a system's quality. Telecommunications (for voice and video), and speech synthesis systems (for generated speech) are a few of the many applications of the method. While MOS tests are widely accepted, they are time-consuming and costly since human input is required. In addition, since the systems and subjects of the tests differ, the results are not really comparable. On the other hand, a large number of previous tests allow us to train machine learning models that are capable of predicting MOS value. By automatically predicting MOS values, both the aforementioned issues can be resolved. The present work introduces data-, training- and post-training specific improvements to a previous self-supervised learning-based MOS prediction model. We used a wav2vec 2.0 model pre-trained on LibriSpeech, extended with LSTM and non-linear dense layers. We introduced transfer learning, target data preprocessing a two- and three-phase training method with different batch formulations, dropout accumulation (for larger batch sizes) and quantization of the predictions. The methods are evaluated using the shared synthetic speech dataset of the first Voice MOS challenge.

2.6CVNov 16, 2022Code

Neurodevelopmental Phenotype Prediction: A State-of-the-Art Deep Learning Model

Dániel Unyi, Bálint Gyires-Tóth

A major challenge in medical image analysis is the automated detection of biomarkers from neuroimaging data. Traditional approaches, often based on image registration, are limited in capturing the high variability of cortical organisation across individuals. Deep learning methods have been shown to be successful in overcoming this difficulty, and some of them have even outperformed medical professionals on certain datasets. In this paper, we apply a deep neural network to analyse the cortical surface data of neonates, derived from the publicly available Developing Human Connectome Project (dHCP). Our goal is to identify neurodevelopmental biomarkers and to predict gestational age at birth based on these biomarkers. Using scans of preterm neonates acquired around the term-equivalent age, we were able to investigate the impact of preterm birth on cortical growth and maturation during late gestation. Besides reaching state-of-the-art prediction accuracy, the proposed model has much fewer parameters than the baselines, and its error stays low on both unregistered and registered cortical surfaces.

1.2SYDec 9, 2023

Position control of an acoustic cavitation bubble by reinforcement learning

Kálmán Klapcsik, Bálint Gyires-Tóth, Juan Manuel Rosselló et al.

A control technique is developed via Reinforcement Learning that allows arbitrary controlling of the position of an acoustic cavitation bubble in a dual-frequency standing acoustic wave field. The agent must choose the optimal pressure amplitude values to manipulate the bubble position in the range of $x/λ_0\in[0.05, 0.25]$. To train the agent an actor-critic off-policy algorithm (Deep Deterministic Policy Gradient) was used that supports continuous action space, which allows setting the pressure amplitude values continuously within $0$ and $1\, \mathrm{bar}$. A shaped reward function is formulated that minimizes the distance between the bubble and the target position and implicitly encourages the agent to perform the position control within the shortest amount of time. In some cases, the optimal control can be 7 times faster than the solution expected from the linear theory.

5.5LGApr 28, 2021Code

Reconstructing nodal pressures in water distribution systems with graph neural networks

Gergely Hajgató, Bálint Gyires-Tóth, György Paál

Knowing the pressure at all times in each node of a water distribution system (WDS) facilitates safe and efficient operation. Yet, complete measurement data cannot be collected due to the limited number of instruments in a real-life WDS. The data-driven methodology of reconstructing all the nodal pressures by observing only a limited number of nodes is presented in the paper. The reconstruction method is based on K-localized spectral graph filters, wherewith graph convolution on water networks is possible. The effect of the number of layers, layer depth and the degree of the Chebyshev-polynomial applied in the kernel is discussed taking into account the peculiarities of the application. In addition, a weighting method is shown, wherewith information on friction loss can be embed into the spectral graph filters through the adjacency matrix. The performance of the proposed model is presented on 3 WDSs at different number of nodes observed compared to the total number of nodes. The weighted connections prove no benefit over the binary connections, but the proposed model reconstructs the nodal pressure with at most 5% relative error on average at an observation ratio of 5% at least. The results are achieved with shallow graph neural networks by following the considerations discussed in the paper.

1.2LGOct 1, 2020

Predicting the flow field in a U-bend with deep neural networks

Gergely Hajgató, Bálint Gyires-Tóth, György Paál

This paper describes a study based on computational fluid dynamics (CFD) and deep neural networks that focusing on predicting the flow field in differently distorted U-shaped pipes. The main motivation of this work was to get an insight about the justification of the deep learning paradigm in hydrodynamic hull optimisation processes that heavily depend on computing turbulent flow fields and that could be accelerated with models like the one presented. The speed-up can be even several orders of magnitude by surrogating the CFD model with a deep convolutional neural network. An automated geometry creation and evaluation process was set up to generate differently shaped two-dimensional U-bends and to carry out CFD simulation on them. This process resulted in a database with different geometries and the corresponding flow fields (2-dimensional velocity distribution), both represented on 128x128 equidistant grids. This database was used to train an encoder-decoder style deep convolutional neural network to predict the velocity distribution from the geometry. The effect of two different representations of the geometry (binary image and signed distance function) on the predictions was examined, both models gave acceptable predictions with a speed-up of two orders of magnitude.

14.9ASApr 14, 2020

Transformer based Grapheme-to-Phoneme Conversion

Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth

Attention mechanism is one of the most successful techniques in deep learning based Natural Language Processing (NLP). The transformer network architecture is completely based on attention mechanisms, and it outperforms sequence-to-sequence models in neural machine translation without recurrent and convolutional layers. Grapheme-to-phoneme (G2P) conversion is a task of converting letters (grapheme sequence) to their pronunciations (phoneme sequence). It plays a significant role in text-to-speech (TTS) and automatic speech recognition (ASR) systems. In this paper, we investigate the application of transformer architecture to G2P conversion and compare its performance with recurrent and convolutional neural network based approaches. Phoneme and word error rates are evaluated on the CMUDict dataset for US English and the NetTalk dataset. The results show that transformer based G2P outperforms the convolutional-based approach in terms of word error rate and our results significantly exceeded previous recurrent approaches (without attention) regarding word and phoneme error rates on both datasets. Furthermore, the size of the proposed model is much smaller than the size of the previous approaches.

1.2MLSep 16, 2019

Distance Assessment and Hypothesis Testing of High-Dimensional Samples using Variational Autoencoders

Marco Henrique de Almeida Inácio, Rafael Izbicki, Bálint Gyires-Tóth

Given two distinct datasets, an important question is if they have arisen from the the same data generating function or alternatively how their data generating functions diverge from one another. In this paper, we introduce an approach for measuring the distance between two datasets with high dimensionality using variational autoencoders. This approach is augmented by a permutation hypothesis test in order to check the hypothesis that the data generating distributions are the same within a significance level. We evaluate both the distance measurement and hypothesis testing approaches on generated and on public datasets. According to the results the proposed approach can be used for data exploration (e.g. by quantifying the discrepancy/separability between categories of images), which can be particularly useful in the early phases of the pipeline of most machine learning projects.