Anugunj Naman

CV
3papers
15citations
Novelty57%
AI Score39

3 Papers

CVMar 5
LAW & ORDER: Adaptive Spatial Weighting for Medical Diffusion and Segmentation

Anugunj Naman, Ayushman Singh, Gaibo Zhang et al.

Medical image analysis relies on accurate segmentation, and benefits from controllable synthesis (of new training images). Yet both tasks of the cyclical pipeline face spatial imbalance: lesions occupy small regions against vast backgrounds. In particular, diffusion models have been shown to drift from prescribed lesion layouts, while efficient segmenters struggle on spatially uncertain regions. Adaptive spatial weighting addresses this by learning where to allocate computational resources. This paper introduces a pair of network adapters: 1) Learnable Adaptive Weighter (LAW) which predicts per-pixel loss modulation from features and masks for diffusion training, stabilized via a mix of normalization, clamping, and regularization to prevent degenerate solutions; and 2) Optimal Region Detection with Efficient Resolution (ORDER) which applies selective bidirectional skip attention at late decoder stages for efficient segmentation. Experiments on polyp and kidney tumor datasets demonstrate that LAW achieves 20% FID generative improvement over a uniform baseline (52.28 vs. 65.60), with synthetic data then improving downstream segmentation by 4.9% Dice coefficient (83.2% vs. 78.3%). ORDER reaches 6.0% Dice improvement on MK-UNet (81.3% vs. 75.3%) with 0.56 GFLOPs and just 42K parameters, remaining 730x smaller than the standard nnUNet.

SDJan 5, 2021
Fixed-MAML for Few Shot Classification in Multilingual Speech Emotion Recognition

Anugunj Naman, Chetan Sinha, Liliana Mancini

In this paper, we analyze the feasibility of applying few-shot learning to speech emotion recognition task (SER). The current speech emotion recognition models work exceptionally well but fail when then input is multilingual. Moreover, when training such models, the models' performance is suitable only when the training corpus is vast. This availability of a big training corpus is a significant problem when choosing a language that is not much popular or obscure. We attempt to solve this challenge of multilingualism and lack of available data by turning this problem into a few-shot learning problem. We suggest relaxing the assumption that all N classes in an N-way K-shot problem be new and define an N+F way problem where N and F are the number of emotion classes and predefined fixed classes, respectively. We propose this modification to the Model-Agnostic MetaLearning (MAML) algorithm to solve the problem and call this new model F-MAML. This modification performs better than the original MAML and outperforms on EmoFilm dataset.

CVDec 9, 2020
Removing Class Imbalance using Polarity-GAN: An Uncertainty Sampling Approach

Kumari Deepshikha, Anugunj Naman

Class imbalance is a challenging issue in practical classification problems for deep learning models as well as for traditional models. Traditionally successful countermeasures such as synthetic over-sampling have had limited success with complex, structured data handled by deep learning models. In this work, we propose to use a Generative Adversarial Network (GAN) equipped with a generator network G, a discriminator network D and a classifier network C to remove the class-imbalance in visual data sets. The generator network is initialized with auto-encoder to make it stable. The discriminator D ensures that G adheres to class distribution of imbalanced class. In conventional methods, where Generator G competes with discriminator D in a min-max game, we propose to further add an additional classifier network to the original network. Now, the generator network tries to compete in a min-max game with Discriminator as well as the new classifier that we have introduced. An additional condition is enforced on generator network G to produce points in the convex hull of desired imbalanced class. Further the contention of adversarial game with classifier C, pushes conditional distribution learned by G towards the periphery of the respective class, compensating the problem of class imbalance. Experimental evidence shows that this initialization results in stable training of the network. We achieve state of the art performance on extreme visual classification task on the FashionMNIST, MNIST, SVHN, ExDark, MVTec Anomaly Detection dataset, Chest X-Ray dataset and others.