Kundan Kumar

NA
24papers
4,033citations
Novelty42%
AI Score29

24 Papers

SDJun 11, 2023Code
High-Fidelity Audio Compression with Improved RVQGAN

Rithesh Kumar, Prem Seetharaman, Alejandro Luebs et al.

Language models have been successfully used to model natural signals, such as images, speech, and music. A key component of these models is a high quality neural compression model that can compress high-dimensional natural signals into lower dimensional discrete tokens. To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz audio into tokens at just 8kbps bandwidth. We achieve this by combining advances in high-fidelity audio generation with better vector quantization techniques from the image domain, along with improved adversarial and reconstruction losses. We compress all domains (speech, environment, music, etc.) with a single universal model, making it widely applicable to generative modeling of all audio. We compare with competing audio compression algorithms, and find our method outperforms them significantly. We provide thorough ablations for every design choice, as well as open-source code and trained model weights. We hope our work can lay the foundation for the next generation of high-fidelity audio modeling.

NADec 10, 2018
On the optimization of the fixed-stress splitting for Biot's equations

Erlend Storvik, Jakub Wiktor Both, Kundan Kumar et al.

In this work we are interested in effectively solving the quasi-static, linear Biot model for poromechanics. We consider the fixed-stress splitting scheme, which is a popular method for iteratively solving Biot's equations. It is well-known that the convergence of the method is strongly dependent on the applied stabilization/tuning parameter. In this work, we propose a new approach to optimize this parameter. We show theoretically that it depends also on the fluid flow properties and not only on the mechanics properties and the coupling coefficient. The type of analysis presented in this paper is not restricted to a particular spatial discretization. We only require it to be inf-sup stable. The convergence proof applies also to low-compressible or incompressible fluids and low-permeable porous media. Illustrative numerical examples, including random initial data, random boundary conditions or random source terms and a well-known benchmark problem, i.e. Mandel's problem are performed. The results are in good agreement with the theoretical findings. Furthermore, we show numerically that there is a connection between the inf-sup stability of discretizations and the performance of the fixed-stress splitting scheme.

NAMay 11, 2018
Anderson accelerated fixed-stress splitting schemes for consolidation of unsaturated porous media

Jakub Wiktor Both, Kundan Kumar, Jan Martin Nordbotten et al.

In this paper, we study the robust linearization of nonlinear poromechanics of unsaturated materials. The model of interest couples the Richards equation with linear elasticity equations, employing the equivalent pore pressure. In practice a monolithic solver is not always available, defining the requirement for a linearization scheme to allow the use of separate simulators, which is not met by the classical Newton method. We propose three different linearization schemes incorporating the fixed-stress splitting scheme, coupled with an L-scheme, Modified Picard and Newton linearization of the flow. All schemes allow the efficient and robust decoupling of mechanics and flow equations. In particular, the simplest scheme, the Fixed-Stress-L-scheme, employs solely constant diagonal stabilization, has low cost per iteration, and is very robust. Under mild, physical assumptions, it is theoretically shown to be a contraction. Due to possible break-down or slow convergence of all considered splitting schemes, Anderson acceleration is applied as post-processing. Based on a special case, we justify theoretically the general ability of the Anderson acceleration to effectively accelerate convergence and stabilize the underlying scheme, allowing even non-contractive fixed-point iterations to converge. To our knowledge, this is the first theoretical indication of this kind. Theoretical findings are confirmed by numerical results. In particular, Anderson acceleration has been demonstrated to be very effective for the considered Picard-type methods. Finally, the Fixed-Stress-Newton scheme combined with Anderson acceleration provides a robust linearization scheme, meeting the above criteria.

NAFeb 1, 2017
Robust iterative schemes for non-linear poromechanics

Manuel Borregales, Florin A. Radu, Kundan Kumar et al.

We consider a non-linear extension of Biot's model for poromechanics, wherein both the fluid flow and mechanical deformation are allowed to be non-linear. We perform an implicit discretization in time (backward Euler) and propose two iterative schemes for solving the non-linear problems appearing within each time step: a splitting algorithm extending the undrained split and fixed stress methods to non-linear problems, and a monolithic L-scheme. The convergence of both schemes is shown rigorously. Illustrative numerical examples are presented to confirm the applicability of the schemes and validate the theoretical results.

NAOct 30, 2018
Splitting method for elliptic equations with line sources

Ingeborg G. Gjerde, Kundan Kumar, Jan M. Nordbotten et al.

In this paper, we study the mathematical structure and numerical approximation of elliptic problems posed in a (3D) domain $Ω$ when the right-hand side is a (1D) line source $Λ$. The analysis and approximation of such problems is known to be non-standard as the line source causes the solution to be singular. Our main result is a splitting theorem for the solution; we show that the solution admits a split into an explicit, low regularity term capturing the singularity, and a high-regularity correction term $w$ being the solution of a suitable elliptic equation. The splitting theorem states the mathematical structure of the solution; in particular, we find that the solution has anisotropic regularity. More precisely, the solution fails to belong to $H^1$ in the neighbourhood of $Λ$, but exhibits piecewise $H^2$-regularity parallel to $Λ$. The splitting theorem can further be used to formulate a numerical method in which the solution is approximated via its correction function $w$. This approach has several benefits. Firstly, it recasts the problem as a 3D elliptic problem with a 3D right-hand side belonging to $L^2$, a problem for which the discretizations and solvers are readily available. Secondly, it makes the numerical approximation independent of the discretization of $Λ$; thirdly, it improves the approximation properties of the numerical method. We consider here the Galerkin finite element method, and show that the singularity subtraction then recovers optimal convergence rates on uniform meshes, i.e., without needing to refine the mesh around each line segment. The numerical method presented in this paper is therefore well-suited for applications involving a large number of line segments. We illustrate this by treating a dataset (consisting of $\sim 3000$ line segments) describing the vascular system of the brain.

NAMay 30, 2019
Iterative solvers for Biot model under small and large deformation

Manuel Antonio Borregales, Kundan Kumar, Jan Martin Nordbotten et al.

We consider L-scheme and Newton based solvers for Biot model under small or large deformation. The mechanical deformation follows the Saint Venant-Kirchoff constitutive law. Further, the fluid compressibility is assumed to be nonlinear. A Lagrangian frame of reference is used to keep track of the deformation. We perform an implicit discretization in time (backward Euler) and propose two linearization schemes for solving the nonlinear problems appearing within each time step: Newton's method and L-scheme. The linearizations are used monolithically or in combination with a splitting algorithm. The resulting schemes can be applied for any spatial discretization. The convergences of all schemes are shown analytically for cases under small deformation. Illustrative numerical examples are presented to confirm the applicability of the schemes, in particular, for large deformation.

NAApr 30, 2017
A convergent mass conservative numerical scheme based on mixed finite elements for two-phase flow in porous media

Florin Adrian Radu, Kundan Kumar, Jan Martin Nordbotten et al.

In this work we present a mass conservative numerical scheme for two-phase flow in porous media. The model for flow consists on two fully coupled, non-linear equations: a degenerate parabolic equation and an elliptic equation. The proposed numerical scheme is based on backward Euler for the temporal discretization and mixed finite element method (MFEM) for the discretization in space. Continuous, semi-discrete (continuous in space) and fully discrete variational formulations are set up and the existence and uniqueness of solutions is discussed. Error estimates are presented to prove the convergence of the scheme. The non-linear systems within each time step are solved by a robust linearization method. This iterative method does not involve any regularization step. The convergence of the linearization scheme is rigorously proved under the assumption of a Lipschitz continuous saturation. The case of a Hölder continuous saturation is also discussed, a rigorous convergence proof being given for Richards' equation. Numerical results are presented to sustain the theoretical findings.

NAJan 2, 2018
Linear iterative schemes for doubly degenerate parabolic equations

Jakub W. Both, Kundan Kumar, Jan M. Nordbotten et al.

Mathematical models for flow and reactive transport in porous media often involve non-linear, degenerate parabolic equations. Their solutions have low regularity, and therefore lower order schemes are used for the numerical approximation. Here the backward Euler method is combined with a mixed finite element method scheme, which results in a stable and locally mass-conservative scheme. At the same time, at each time step one has to solve a non-linear algebraic system, for which linear iterations are needed. Finding robust and convergent ones is particularly challenging here, since both slow and fast diffusion cases are allowed. Commonly used schemes, like Newton and Picard iterations, are defined either for non-degenerate problems, or after regularising the problem in the case of degenerate ones. Convergence is guaranteed only if the initial guess is sufficiently close to the solution, which translates into severe restrictions on the time step. Here we discuss a linear iterative scheme which builds on the $L$-scheme, and does not employ any regularisation. We prove its rigourous convergence, which is obtained for mild restrictions on the time step. Finally, we give numerical results confirming the theoretical ones, and compare the behaviour of the scheme with other schemes.

NAFeb 3, 2018
A parallel-in-time fixed-stress splitting method for Biot's consolidation model

Manuel Borregales, Kundan Kumar, Florin Adrian Radu et al.

In this work, we study the parallel-in-time iterative solution of coupled flow and geomechanics in porous media, modelled by a two-field formulation of the Biot's equations. In particular, we propose a new version of the fixed stress splitting method, which has been widely used as solution method of these problems. This new approach forgets about the sequential nature of the temporal variable and considers the time direction as a further direction for parallelization. We present a rigorous convergence analysis of the method and a numerical experiment to demonstrate the robust behaviour of the algorithm.

IVMar 18, 2022
Application of Top-hat Transformation for Enhanced Blood Vessel Extraction

Tithi Parna Das, Sheetal Praharaj, Sarita Swain et al.

In the medical domain, different computer-aided diagnosis systems have been proposed to extract blood vessels from retinal fundus images for the clinical treatment of vascular diseases. Accurate extraction of blood vessels from the fundus images using a computer-generated method can help the clinician to produce timely and accurate reports for the patient suffering from these diseases. In this article, we integrate top-hat based preprocessing approach with fine-tuned B-COSFIRE filter to achieve more accurate segregation of blood vessel pixels from the background. The use of top-hat transformation in the preprocessing stage enhances the efficacy of the algorithm to extract blood vessels in presence of structures like fovea, exudates, haemorrhages, etc. Furthermore, to reduce the false positives, small clusters of blood vessel pixels are removed in the postprocessing stage. Further, we find that the proposed algorithm is more efficient as compared to various modern algorithms reported in the literature.

IVMar 18, 2022
Parametric Scaling of Preprocessing assisted U-net Architecture for Improvised Retinal Vessel Segmentation

Kundan Kumar, Sumanshu Agarwal

Extracting blood vessels from retinal fundus images plays a decisive role in diagnosing the progression in pertinent diseases. In medical image analysis, vessel extraction is a semantic binary segmentation problem, where blood vasculature needs to be extracted from the background. Here, we present an image enhancement technique based on the morphological preprocessing coupled with a scaled U-net architecture. Despite a relatively less number of trainable network parameters, the scaled version of U-net architecture provides better performance compare to other methods in the domain. We validated the proposed method on retinal fundus images from the DRIVE database. A significant improvement as compared to the other algorithms in the domain, in terms of the area under ROC curve (>0.9762) and classification accuracy (>95.47%) are evident from the results. Furthermore, the proposed method is resistant to the central vessel reflex while sensitive to detect blood vessels in the presence of background items viz. exudates, optic disc, and fovea.

SDOct 21, 2021Code
Wav2CLIP: Learning Robust Audio Representations From CLIP

Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar et al.

We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on a variety of audio tasks including classification, retrieval, and generation, and show that Wav2CLIP can outperform several publicly available pre-trained audio representation algorithms. Wav2CLIP projects audio into a shared embedding space with images and text, which enables multimodal applications such as zero-shot classification, and cross-modal retrieval. Furthermore, Wav2CLIP needs just ~10% of the data to achieve competitive performance on downstream tasks compared with fully supervised models, and is more efficient to pre-train than competing methods as it does not require learning a visual model in concert with an auditory model. Finally, we demonstrate image generation from Wav2CLIP as qualitative assessment of the shared embedding space. Our code and model weights are open sourced and made available for further applications.

CVFeb 28, 2022
Pattern Based Multivariable Regression using Deep Learning (PBMR-DP)

Jiztom Kavalakkatt Francis, Chandan Kumar, Jansel Herrera-Gerena et al.

We propose a deep learning methodology for multivariate regression that is based on pattern recognition that triggers fast learning over sensor data. We used a conversion of sensors-to-image which enables us to take advantage of Computer Vision architectures and training processes. In addition to this data preparation methodology, we explore the use of state-of-the-art architectures to generate regression outputs to predict agricultural crop continuous yield information. Finally, we compare with some of the top models reported in MLCAS2021. We found that using a straightforward training process, we were able to accomplish an MAE of 4.394, RMSE of 5.945, and R^2 of 0.861.

ASOct 19, 2021
Chunked Autoregressive GAN for Conditional Waveform Synthesis

Max Morrison, Rithesh Kumar, Kundan Kumar et al.

Conditional waveform synthesis models learn a distribution of audio waveforms given conditioning such as text, mel-spectrograms, or MIDI. These systems employ deep generative models that model the waveform via either sequential (autoregressive) or parallel (non-autoregressive) sampling. Generative adversarial networks (GANs) have become a common choice for non-autoregressive waveform synthesis. However, state-of-the-art GAN-based models produce artifacts when performing mel-spectrogram inversion. In this paper, we demonstrate that these artifacts correspond with an inability for the generator to learn accurate pitch and periodicity. We show that simple pitch and periodicity conditioning is insufficient for reducing this error relative to using autoregression. We discuss the inductive bias that autoregression provides for learning the relationship between instantaneous frequency and phase, and show that this inductive bias holds even when autoregressively sampling large chunks of the waveform during each forward pass. Relative to prior state-of-the-art GAN-based models, our proposed model, Chunked Autoregressive GAN (CARGAN) reduces pitch error by 40-60%, reduces training time by 58%, maintains a fast generation speed suitable for real-time or interactive applications, and maintains or improves subjective quality.

SDOct 22, 2020
NU-GAN: High resolution neural upsampling with GAN

Rithesh Kumar, Kundan Kumar, Vicki Anand et al.

In this paper, we propose NU-GAN, a new method for resampling audio from lower to higher sampling rates (upsampling). Audio upsampling is an important problem since productionizing generative speech technology requires operating at high sampling rates. Such applications use audio at a resolution of 44.1 kHz or 48 kHz, whereas current speech synthesis methods are equipped to handle a maximum of 24 kHz resolution. NU-GAN takes a leap towards solving audio upsampling as a separate component in the text-to-speech (TTS) pipeline by leveraging techniques for audio generation using GANs. ABX preference tests indicate that our NU-GAN resampler is capable of resampling 22 kHz to 44.1 kHz audio that is distinguishable from original audio only 7.4% higher than random chance for single speaker dataset, and 10.8% higher than chance for multi-speaker dataset.

IVOct 26, 2019
Blood Vessel Detection using Modified Multiscale MF-FDOG Filters for Diabetic Retinopathy

Debojyoti Mallick, Kundan Kumar, Sumanshu Agarwal

Blindness in diabetic patients caused by retinopathy (characterized by an increase in the diameter and new branches of the blood vessels inside the retina) is a grave concern. Many efforts have been made for the early detection of the disease using various image processing techniques on retinal images. However, most of the methods are plagued with the false detection of the blood vessel pixels. Given that, here, we propose a modified matched filter with the first derivative of Gaussian. The method uses the top-hat transform and contrast limited histogram equalization. Further, we segment the modified multiscale matched filter response by using a binary threshold obtained from the first derivative of Gaussian. The method was assessed on a publicly available database (DRIVE database). As anticipated, the proposed method provides a higher accuracy compared to the literature. Moreover, a lesser false detection from the existing matched filters and its variants have been observed.

ASOct 8, 2019
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Kundan Kumar, Rithesh Kumar, Thibault de Boissiere et al.

Previous works (Donahue et al., 2018a; Engel et al., 2019a) have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques. Subjective evaluation metric (Mean Opinion Score, or MOS) shows the effectiveness of the proposed approach for high quality mel-spectrogram inversion. To establish the generality of the proposed techniques, we show qualitative results of our model in speech synthesis, music domain translation and unconditional music synthesis. We evaluate the various components of the model through ablation studies and suggest a set of guidelines to design general purpose discriminators and generators for conditional sequence synthesis tasks. Our model is non-autoregressive, fully convolutional, with significantly fewer parameters than competing models and generalizes to unseen speakers for mel-spectrogram inversion. Our pytorch implementation runs at more than 100x faster than realtime on GTX 1080Ti GPU and more than 2x faster than real-time on CPU, without any hardware specific optimization tricks.

IVAug 12, 2019
Automated retinal vessel segmentation based on morphological preprocessing and 2D-Gabor wavelets

Kundan Kumar, Debashisa Samal, Suraj

Automated segmentation of vascular map in retinal images endeavors a potential benefit in diagnostic procedure of different ocular diseases. In this paper, we suggest a new unsupervised retinal blood vessel segmentation approach using top-hat transformation, contrast-limited adaptive histogram equalization (CLAHE), and 2-D Gabor wavelet filters. Initially, retinal image is preprocessed using top-hat morphological transformation followed by CLAHE to enhance only the blood vessel pixels in the presence of exudates, optic disc, and fovea. Then, multiscale 2-D Gabor wavelet filters are applied on preprocessed image for better representation of thick and thin blood vessels located at different orientations. The efficacy of the presented algorithm is assessed on publicly available DRIVE database with manually labeled images. On DRIVE database, we achieve an average accuracy of 94.32% with a small standard deviation of 0.004. In comparison with major algorithms, our algorithm produces better performance concerning the accuracy, sensitivity, and kappa agreement.

NEMay 23, 2018
Large-Scale Neuromorphic Spiking Array Processors: A quest to mimic the brain

Chetan Singh Thakur, Jamal Molin, Gert Cauwenberghs et al.

Neuromorphic engineering (NE) encompasses a diverse range of approaches to information processing that are inspired by neurobiological systems, and this feature distinguishes neuromorphic systems from conventional computing systems. The brain has evolved over billions of years to solve difficult engineering problems by using efficient, parallel, low-power computation. The goal of NE is to design systems capable of brain-like computation. Numerous large-scale neuromorphic projects have emerged recently. This interdisciplinary field was listed among the top 10 technology breakthroughs of 2014 by the MIT Technology Review and among the top 10 emerging technologies of 2015 by the World Economic Forum. NE has two-way goals: one, a scientific goal to understand the computational properties of biological neural systems by using models implemented in integrated circuits (ICs); second, an engineering goal to exploit the known properties of biological systems to design and implement efficient devices for engineering applications. Building hardware neural emulators can be extremely useful for simulating large-scale neural models to explain how intelligent behavior arises in the brain. The principle advantages of neuromorphic emulators are that they are highly energy efficient, parallel and distributed, and require a small silicon area. Thus, compared to conventional CPUs, these neuromorphic emulators are beneficial in many engineering applications such as for the porting of deep learning algorithms for various recognitions tasks. In this review article, we describe some of the most significant neuromorphic spiking emulators, compare the different architectures and approaches used by them, illustrate their advantages and drawbacks, and highlight the capabilities that each can deliver to neural modelers.

CVDec 6, 2017
ObamaNet: Photo-realistic lip-sync from text

Rithesh Kumar, Jose Sotelo, Kundan Kumar et al.

We present ObamaNet, the first architecture that generates both audio and synchronized photo-realistic lip-sync videos from any new text. Contrary to other published lip-sync approaches, ours is only composed of fully trainable neural modules and does not rely on any traditional computer graphics methods. More precisely, we use three main modules: a text-to-speech network based on Char2Wav, a time-delayed LSTM to generate mouth-keypoints synced to the audio, and a network based on Pix2Pix to generate the video frames conditioned on the keypoints.

SDDec 22, 2016
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Soroush Mehri, Kundan Kumar, Ishaan Gulrajani et al.

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.

CVDec 19, 2016
On Random Weights for Texture Generation in One Layer Neural Networks

Mihir Mongia, Kundan Kumar, Akram Erraqabi et al.

Recent work in the literature has shown experimentally that one can use the lower layers of a trained convolutional neural network (CNN) to model natural textures. More interestingly, it has also been experimentally shown that only one layer with random filters can also model textures although with less variability. In this paper we ask the question as to why one layer CNNs with random filters are so effective in generating textures? We theoretically show that one layer convolutional architectures (without a non-linearity) paired with the an energy function used in previous literature, can in fact preserve and modulate frequency coefficients in a manner so that random weights and pretrained weights will generate the same type of images. Based on the results of this analysis we question whether similar properties hold in the case where one uses one convolution layer with a non-linearity. We show that in the case of ReLu non-linearity there are situations where only one input will give the minimum possible energy whereas in the case of no nonlinearity, there are always infinite solutions that will give the minimum possible energy. Thus we can show that in certain situations adding a ReLu non-linearity generates less variable images.

LGNov 15, 2016
PixelVAE: A Latent Variable Model for Natural Images

Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed et al.

Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representation and model global structure well but have difficulty capturing small details. PixelCNN models details very well, but lacks a latent code and is difficult to scale for capturing large structures. We present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. Our model requires very few expensive autoregressive layers compared to PixelCNN and learns latent codes that are more compressed than a standard VAE while still capturing most non-trivial structure. Finally, we extend our model to a hierarchy of latent variables at different scales. Our model achieves state-of-the-art performance on binarized MNIST, competitive performance on 64x64 ImageNet, and high-quality samples on the LSUN bedrooms dataset.

HCDec 23, 2015
Deep Value of Information Estimators for Collaborative Human-Machine Information Gathering

Kin Gwn Lore, Nicholas Sweet, Kundan Kumar et al.

Effective human-machine collaboration can significantly improve many learning and planning strategies for information gathering via fusion of 'hard' and 'soft' data originating from machine and human sensors, respectively. However, gathering the most informative data from human sensors without task overloading remains a critical technical challenge. In this context, Value of Information (VOI) is a crucial decision-theoretic metric for scheduling interaction with human sensors. We present a new Deep Learning based VOI estimation framework that can be used to schedule collaborative human-machine sensing with computationally efficient online inference and minimal policy hand-tuning. Supervised learning is used to train deep convolutional neural networks (CNNs) to extract hierarchical features from 'images' of belief spaces obtained via data fusion. These features can be associated with soft data query choices to reliably compute VOI for human interaction. The CNN framework is described in detail, and a performance comparison to a feature-based POMDP scheduling policy is provided. The practical feasibility of our method is also demonstrated on a mobile robotic search problem with language-based semantic human sensor inputs.