CVApr 12, 2023Code
Scale-Equivariant Deep Learning for 3D DataThomas Wimmer, Vladimir Golkov, Hoai Nam Dang et al. · eth-zurich
The ability of convolutional neural networks (CNNs) to recognize objects regardless of their position in the image is due to the translation-equivariance of the convolutional operation. Group-equivariant CNNs transfer this equivariance to other transformations of the input. Dealing appropriately with objects and object parts of different scale is challenging, and scale can vary for multiple reasons such as the underlying object size or the resolution of the imaging modality. In this paper, we propose a scale-equivariant convolutional network layer for three-dimensional data that guarantees scale-equivariance in 3D CNNs. Scale-equivariance lifts the burden of having to learn each possible scale separately, allowing the neural network to focus on higher-level learning goals, which leads to better results and better data-efficiency. We provide an overview of the theoretical foundations and scientific work on scale-equivariant neural networks in the two-dimensional domain. We then transfer the concepts from 2D to the three-dimensional space and create a scale-equivariant convolutional layer for 3D data. Using the proposed scale-equivariant layer, we create a scale-equivariant U-Net for medical image segmentation and compare it with a non-scale-equivariant baseline method. Our experiments demonstrate the effectiveness of the proposed method in achieving scale-equivariance for 3D medical image analysis. We publish our code at https://github.com/wimmerth/scale-equivariant-3d-convnet for further research and application.
LGJul 30, 2024
How to Choose a Reinforcement-Learning AlgorithmFabian Bongratz, Vladimir Golkov, Lukas Mautner et al.
The field of reinforcement learning offers a large variety of concepts and methods to tackle sequential decision-making problems. This variety has become so large that choosing an algorithm for a task at hand can be challenging. In this work, we streamline the process of choosing reinforcement-learning algorithms and action-distribution families. We provide a structured overview of existing methods and their properties, as well as guidelines for when to choose which methods. An interactive version of these guidelines is available online at https://rl-picker.github.io/.
CVFeb 13, 2021Code
Rotation-Equivariant Deep Learning for Diffusion MRIPhilip Müller, Vladimir Golkov, Valentina Tomassini et al.
Convolutional networks are successful, but they have recently been outperformed by new neural networks that are equivariant under rotations and translations. These new networks work better because they do not struggle with learning each possible orientation of each image feature separately. So far, they have been proposed for 2D and 3D data. Here we generalize them to 6D diffusion MRI data, ensuring joint equivariance under 3D roto-translations in image space and the matching 3D rotations in $q$-space, as dictated by the image formation. Such equivariant deep learning is appropriate for diffusion MRI, because microstructural and macrostructural features such as neural fibers can appear at many different orientations, and because even non-rotation-equivariant deep learning has so far been the best method for many diffusion MRI tasks. We validate our equivariant method on multiple-sclerosis lesion segmentation. Our proposed neural networks yield better results and require fewer scans for training compared to non-rotation-equivariant deep learning. They also inherit all the advantages of deep learning over classical diffusion MRI methods. Our implementation is available at https://github.com/philip-mueller/equivariant-deep-dmri and can be used off the shelf without understanding the mathematical background.
LGNov 6, 2024
Interactions Across Blocks in Post-Training Quantization of Large Language ModelsKhasmamad Shabanovi, Lukas Wiest, Vladimir Golkov et al.
Post-training quantization is widely employed to reduce the computational demands of neural networks. Typically, individual substructures, such as layers or blocks of layers, are quantized with the objective of minimizing quantization errors in their pre-activations by fine-tuning the corresponding weights. Deriving this local objective from the global objective of minimizing task loss involves two key simplifications: assuming substructures are mutually independent and ignoring the knowledge of subsequent substructures as well as the task loss. In this work, we assess the effects of these simplifications on weight-only quantization of large language models. We introduce two multi-block fine-tuning strategies and compare them against the baseline of fine-tuning single transformer blocks. The first captures correlations of weights across blocks by jointly optimizing multiple quantized blocks. The second incorporates knowledge of subsequent blocks by minimizing the error in downstream pre-activations rather than focusing solely on the quantized block. Our findings indicate that the effectiveness of these methods depends on the specific network model, with no impact on some models but demonstrating significant benefits for others.
MED-PHMay 12, 2023
Joint MR sequence optimization beats pure neural network approaches for spin-echo MRI super-resolutionHoai Nam Dang, Vladimir Golkov, Thomas Wimmer et al.
Current MRI super-resolution (SR) methods only use existing contrasts acquired from typical clinical sequences as input for the neural network (NN). In turbo spin echo sequences (TSE) the sequence parameters can have a strong influence on the actual resolution of the acquired image and have consequently a considera-ble impact on the performance of the NN. We propose a known-operator learning approach to perform an end-to-end optimization of MR sequence and neural net-work parameters for SR-TSE. This MR-physics-informed training procedure jointly optimizes the radiofrequency pulse train of a proton density- (PD-) and T2-weighted TSE and a subsequently applied convolutional neural network to predict the corresponding PDw and T2w super-resolution TSE images. The found radiofrequency pulse train designs generate an optimal signal for the NN to perform the SR task. Our method generalizes from the simulation-based optimi-zation to in vivo measurements and the acquired physics-informed SR images show higher correlation with a time-consuming segmented high-resolution TSE sequence compared to a pure network training approach.
CVSep 23, 2021
Scene Graph Generation for Better Image Captioning?Maximilian Mozes, Martin Schmitt, Vladimir Golkov et al.
We investigate the incorporation of visual relationships into the task of supervised image caption generation by proposing a model that leverages detected objects and auto-generated visual relationships to describe images in natural language. To do so, we first generate a scene graph from raw image pixels by identifying individual objects and visual relationships between them. This scene graph then serves as input to our graph-to-text model, which generates the final caption. In contrast to previous approaches, our model thus explicitly models the detection of objects and visual relationships in the image. For our experiments we construct a new dataset from the intersection of Visual Genome and MS COCO, consisting of images with both a corresponding gold scene graph and human-authored caption. Our results show that our methods outperform existing state-of-the-art end-to-end models that generate image descriptions directly from raw input pixels when compared in terms of the BLEU and METEOR evaluation metrics.
ASOct 28, 2020
Speech Synthesis and Control Using Differentiable DSPGiorgio Fabbro, Vladimir Golkov, Thomas Kemp et al.
Modern text-to-speech systems are able to produce natural and high-quality speech, but speech contains factors of variation (e.g. pitch, rhythm, loudness, timbre)\ that text alone cannot contain. In this work we move towards a speech synthesis system that can produce diverse speech renditions of a text by allowing (but not requiring) explicit control over the various factors of variation. We propose a new neural vocoder that offers control of such factors of variation. This is achieved by employing differentiable digital signal processing (DDSP) (previously used only for music rather than speech), which exposes these factors of variation. The results show that the proposed approach can produce natural speech with realistic timbre, and individual factors of variation can be freely controlled.
BMJun 25, 2020
Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost FunctionsVladimir Golkov, Alexander Becker, Daniel T. Plop et al.
Computer-aided drug discovery is an essential component of modern drug development. Therein, deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features. Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets. In this work we argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance, its ability to compromise over different decision thresholds, certain freedom to influence the relative weights in this compromise, fidelity to typical benchmarking measures, and equivalence to positive/unlabeled learning. We also propose new training schemes (coherent mini-batch arrangement, and usage of out-of-batch samples) for cost functions based on the ROC, as well as a cost function based on the logAUC metric that facilitates early enrichment (i.e. improves performance at high decision thresholds, as often desired when synthesizing predicted hit compounds). We demonstrate that these approaches outperform standard deep learning approaches on a series of PubChem high-throughput screening datasets that represent realistic and diverse drug discovery campaigns on major drug target families.
LGOct 31, 2019
Deep Learning for 2D and 3D Rotatable Data: An Overview of MethodsLuca Della Libera, Vladimir Golkov, Yue Zhu et al.
Convolutional networks are successful due to their equivariance/invariance under translations. However, rotatable data such as images, volumes, shapes, or point clouds require processing with equivariance/invariance under rotations in cases where the rotational orientation of the coordinate system does not affect the meaning of the data (e.g. object classification). On the other hand, estimation/processing of rotations is necessary in cases where rotations are important (e.g. motion estimation). There has been recent progress in methods and theory in all these regards. Here we provide an overview of existing methods, both for 2D and 3D rotations (and translations), and identify commonalities and links between them.
NEMay 8, 2019
Learning to EvolveJan Schuchardt, Vladimir Golkov, Daniel Cremers
Evolution and learning are two of the fundamental mechanisms by which life adapts in order to survive and to transcend limitations. These biological phenomena inspired successful computational methods such as evolutionary algorithms and deep learning. Evolution relies on random mutations and on random genetic recombination. Here we show that learning to evolve, i.e. learning to mutate and recombine better than at random, improves the result of evolution in terms of fitness increase per generation and even in terms of attainable fitness. We use deep reinforcement learning to learn to dynamically adjust the strategy of evolutionary algorithms to varying circumstances. Our methods outperform classical evolutionary algorithms on combinatorial and continuous optimization problems.
MLJun 8, 2018
q-Space Novelty Detection with Variational AutoencodersAleksei Vasilev, Vladimir Golkov, Marc Meissner et al.
In machine learning, novelty detection is the task of identifying novel unseen data. During training, only samples from the normal class are available. Test samples are classified as normal or abnormal by assignment of a novelty score. Here we propose novelty detection methods based on training variational autoencoders (VAEs) on normal data. Since abnormal samples are not used during training, we define novelty metrics based on the (partially complementary) assumptions that the VAE is less capable of reconstructing abnormal samples well; that abnormal samples more strongly violate the VAE regularizer; and that abnormal samples differ from normal samples not only in input-feature space, but also in the VAE latent space and VAE output. These approaches, combined with various possibilities of using (e.g. sampling) the probabilistic VAE to obtain scalar novelty scores, yield a large family of methods. We apply these methods to magnetic resonance imaging, namely to the detection of diffusion-space (q-space) abnormalities in diffusion MRI scans of multiple sclerosis patients, i.e. to detect multiple sclerosis lesions without using any lesion labels for training. Many of our methods outperform previously proposed q-space novelty detection methods. We also evaluate the proposed methods on the MNIST handwritten digits dataset and show that many of them are able to outperform the state of the art.
LGJan 23, 2018
Clustering with Deep Learning: Taxonomy and New MethodsElie Aljalbout, Vladimir Golkov, Yawar Siddiqui et al.
Clustering methods based on deep neural networks have proven promising for clustering real-world data because of their high representational power. In this paper, we propose a systematic taxonomy of clustering methods that utilize deep neural networks. We base our taxonomy on a comprehensive review of recent work and validate the taxonomy in a case study. In this case study, we show that the taxonomy enables researchers and practitioners to systematically create new clustering methods by selectively recombining and replacing distinct aspects of previous methods with the goal of overcoming their individual limitations. The experimental evaluation confirms this and shows that the method created for the case study achieves state-of-the-art clustering quality and surpasses it in some cases.
LGOct 29, 2017
Regularization for Deep Learning: A TaxonomyJan Kukačka, Vladimir Golkov, Daniel Cremers
Regularization is one of the crucial ingredients of deep learning, yet the term regularization has various definitions, and regularization methods are often studied separately from each other. In our work we present a systematic, unifying taxonomy to categorize existing methods. We distinguish methods that affect data, network architectures, error terms, regularization terms, and optimization procedures. We do not provide all details about the listed methods; instead, we present an overview of how the methods can be sorted into meaningful categories and sub-categories. This helps revealing links and fundamental similarities between them. Finally, we include practical recommendations both for users and for developers of new regularization methods.
BMApr 13, 2017
3D Deep Learning for Biological Function Prediction from Physical FieldsVladimir Golkov, Marcin J. Skwark, Atanas Mirchev et al.
Predicting the biological function of molecules, be it proteins or drug-like compounds, from their atomic structure is an important and long-standing problem. Function is dictated by structure, since it is by spatial interactions that molecules interact with each other, both in terms of steric complementarity, as well as intermolecular forces. Thus, the electron density field and electrostatic potential field of a molecule contain the "raw fingerprint" of how this molecule can fit to binding partners. In this paper, we show that deep learning can predict biological function of molecules directly from their raw 3D approximated electron density and electrostatic potential fields. Protein function based on EC numbers is predicted from the approximated electron density field. In another experiment, the activity of small molecules is predicted with quality comparable to state-of-the-art descriptor-based methods. We propose several alternative computational models for the GPU with different memory and runtime requirements for different sizes of molecules and of databases. We also propose application-specific multi-channel data representations. With future improvements of training datasets and neural network settings in combination with complementary information sources (sequence, genomic context, expression level), deep learning can be expected to show its generalization power and revolutionize the field of molecular function prediction.
CVApr 26, 2015
FlowNet: Learning Optical Flow with Convolutional NetworksPhilipp Fischer, Alexey Dosovitskiy, Eddy Ilg et al.
Convolutional neural networks (CNNs) have recently been very successful in a variety of computer vision tasks, especially on those linked to recognition. Optical flow estimation has not been among the tasks where CNNs were successful. In this paper we construct appropriate CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. We propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations. Since existing ground truth data sets are not sufficiently large to train a CNN, we generate a synthetic Flying Chairs dataset. We show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI, achieving competitive accuracy at frame rates of 5 to 10 fps.