Richa Upadhyay

CV
h-index6
9papers
82citations
Novelty46%
AI Score30

9 Papers

IVMar 15, 2022Code
Magnification Prior: A Self-Supervised Method for Learning Representations on Breast Cancer Histopathological Images

Prakash Chandra Chhipa, Richa Upadhyay, Gustav Grund Pihlgren et al.

This work presents a novel self-supervised pre-training method to learn efficient representations without labels on histopathology medical images utilizing magnification factors. Other state-of-theart works mainly focus on fully supervised learning approaches that rely heavily on human annotations. However, the scarcity of labeled and unlabeled data is a long-standing challenge in histopathology. Currently, representation learning without labels remains unexplored for the histopathology domain. The proposed method, Magnification Prior Contrastive Similarity (MPCS), enables self-supervised learning of representations without labels on small-scale breast cancer dataset BreakHis by exploiting magnification factor, inductive transfer, and reducing human prior. The proposed method matches fully supervised learning state-of-the-art performance in malignancy classification when only 20% of labels are used in fine-tuning and outperform previous works in fully supervised learning settings. It formulates a hypothesis and provides empirical evidence to support that reducing human-prior leads to efficient representation learning in self-supervision. The implementation of this work is available online on GitHub - https://github.com/prakashchhipa/Magnification-Prior-Self-Supervised-Method

CVOct 13, 2022Code
Multi-Task Meta Learning: learn how to adapt to unseen tasks

Richa Upadhyay, Prakash Chandra Chhipa, Ronald Phlypo et al.

This work proposes Multi-task Meta Learning (MTML), integrating two learning paradigms Multi-Task Learning (MTL) and meta learning, to bring together the best of both worlds. In particular, it focuses simultaneous learning of multiple tasks, an element of MTL and promptly adapting to new tasks, a quality of meta learning. It is important to highlight that we focus on heterogeneous tasks, which are of distinct kind, in contrast to typically considered homogeneous tasks (e.g., if all tasks are classification or if all tasks are regression tasks). The fundamental idea is to train a multi-task model, such that when an unseen task is introduced, it can learn in fewer steps whilst offering a performance at least as good as conventional single task learning on the new task or inclusion within the MTL. By conducting various experiments, we demonstrate this paradigm on two datasets and four tasks: NYU-v2 and the taskonomy dataset for which we perform semantic segmentation, depth estimation, surface normal estimation, and edge detection. MTML achieves state-of-the-art results for three out of four tasks for the NYU-v2 dataset and two out of four for the taskonomy dataset. In the taskonomy dataset, it was discovered that many pseudo-labeled segmentation masks lacked classes that were expected to be present in the ground truth; however, our MTML approach was found to be effective in detecting these missing classes, delivering good qualitative results. While, quantitatively its performance was affected due to the presence of incorrect ground truth labels. The the source code for reproducibility can be found at https://github.com/ricupa/MTML-learn-how-to-adapt-to-unseen-tasks.

CVOct 18, 2022
Depth Contrast: Self-Supervised Pretraining on 3DPM Images for Mining Material Classification

Prakash Chandra Chhipa, Richa Upadhyay, Rajkumar Saini et al.

This work presents a novel self-supervised representation learning method to learn efficient representations without labels on images from a 3DPM sensor (3-Dimensional Particle Measurement; estimates the particle size distribution of material) utilizing RGB images and depth maps of mining material on the conveyor belt. Human annotations for material categories on sensor-generated data are scarce and cost-intensive. Currently, representation learning without human annotations remains unexplored for mining materials and does not leverage on utilization of sensor-generated data. The proposed method, Depth Contrast, enables self-supervised learning of representations without labels on the 3DPM dataset by exploiting depth maps and inductive transfer. The proposed method outperforms material classification over ImageNet transfer learning performance in fully supervised learning settings and achieves an F1 score of 0.73. Further, The proposed method yields an F1 score of 0.65 with an 11% improvement over ImageNet transfer learning performance in a semi-supervised setting when only 20% of labels are used in fine-tuning. Finally, the Proposed method showcases improved performance generalization on linear evaluation. The implementation of proposed method is available on GitHub.

CVAug 23, 2023
Less is More -- Towards parsimonious multi-task models using structured sparsity

Richa Upadhyay, Ronald Phlypo, Rajkumar Saini et al.

Model sparsification in deep learning promotes simpler, more interpretable models with fewer parameters. This not only reduces the model's memory footprint and computational needs but also shortens inference time. This work focuses on creating sparse models optimized for multiple tasks with fewer parameters. These parsimonious models also possess the potential to match or outperform dense models in terms of performance. In this work, we introduce channel-wise l1/l2 group sparsity in the shared convolutional layers parameters (or weights) of the multi-task learning model. This approach facilitates the removal of extraneous groups i.e., channels (due to l1 regularization) and also imposes a penalty on the weights, further enhancing the learning efficiency for all tasks (due to l2 regularization). We analyzed the results of group sparsity in both single-task and multi-task settings on two widely-used Multi-Task Learning (MTL) datasets: NYU-v2 and CelebAMask-HQ. On both datasets, which consist of three different computer vision tasks each, multi-task models with approximately 70% sparsity outperform their dense equivalents. We also investigate how changing the degree of sparsification influences the model's performance, the overall sparsity percentage, the patterns of sparsity, and the inference time.

CVMar 12, 2023
Functional Knowledge Transfer with Self-supervised Representation Learning

Prakash Chandra Chhipa, Muskaan Chopra, Gopal Mengi et al.

This work investigates the unexplored usability of self-supervised representation learning in the direction of functional knowledge transfer. In this work, functional knowledge transfer is achieved by joint optimization of self-supervised learning pseudo task and supervised learning task, improving supervised learning task performance. Recent progress in self-supervised learning uses a large volume of data, which becomes a constraint for its applications on small-scale datasets. This work shares a simple yet effective joint training framework that reinforces human-supervised task learning by learning self-supervised representations just-in-time and vice versa. Experiments on three public datasets from different visual domains, Intel Image, CIFAR, and APTOS, reveal a consistent track of performance improvements on classification tasks during joint optimization. Qualitative analysis also supports the robustness of learnt representations. Source code and trained models are available on GitHub.

CVMay 5, 2022
Deep Neural Network approaches for Analysing Videos of Music Performances

Foteini Simistira Liwicki, Richa Upadhyay, Prakash Chandra Chhipa et al.

This paper presents a framework to automate the labelling process for gestures in musical performance videos with a 3D Convolutional Neural Network (CNN). While this idea was proposed in a previous study, this paper introduces several novelties: (i) Presents a novel method to overcome the class imbalance challenge and make learning possible for co-existent gestures by batch balancing approach and spatial-temporal representations of gestures. (ii) Performs a detailed study on 7 and 18 categories of gestures generated during the performance (guitar play) of musical pieces that have been video-recorded. (iii) Investigates the possibility to use audio features. (iv) Extends the analysis to multiple videos. The novel methods significantly improve the performance of gesture identification by 12 %, when compared to the previous work (51 % in this study over 39 % in previous work). We successfully validate the proposed methods on 7 super classes (72 %), an ensemble of the 18 gestures/classes, and additional videos (75 %).

LGJan 21, 2025
Meta-Sparsity: Learning Optimal Sparse Structures in Multi-task Networks through Meta-learning

Richa Upadhyay, Ronald Phlypo, Rajkumar Saini et al.

This paper presents meta-sparsity, a framework for learning model sparsity, basically learning the parameter that controls the degree of sparsity, that allows deep neural networks (DNNs) to inherently generate optimal sparse shared structures in multi-task learning (MTL) setting. This proposed approach enables the dynamic learning of sparsity patterns across a variety of tasks, unlike traditional sparsity methods that rely heavily on manual hyperparameter tuning. Inspired by Model Agnostic Meta-Learning (MAML), the emphasis is on learning shared and optimally sparse parameters in multi-task scenarios by implementing a penalty-based, channel-wise structured sparsity during the meta-training phase. This method improves the model's efficacy by removing unnecessary parameters and enhances its ability to handle both seen and previously unseen tasks. The effectiveness of meta-sparsity is rigorously evaluated by extensive experiments on two datasets, NYU-v2 and CelebAMask-HQ, covering a broad spectrum of tasks ranging from pixel-level to image-level predictions. The results show that the proposed approach performs well across many tasks, indicating its potential as a versatile tool for creating efficient and adaptable sparse neural networks. This work, therefore, presents an approach towards learning sparsity, contributing to the efforts in the field of sparse neural networks and suggesting new directions for research towards parsimonious models.

CVJun 5, 2024
Giving each task what it needs -- leveraging structured sparsity for tailored multi-task learning

Richa Upadhyay, Ronald Phlypo, Rajkumar Saini et al.

In the Multi-task Learning (MTL) framework, every task demands distinct feature representations, ranging from low-level to high-level attributes. It is vital to address the specific (feature/parameter) needs of each task, especially in computationally constrained environments. This work, therefore, introduces Layer-Optimized Multi-Task (LOMT) models that utilize structured sparsity to refine feature selection for individual tasks and enhance the performance of all tasks in a multi-task scenario. Structured or group sparsity systematically eliminates parameters from trivial channels and, sometimes, eventually, entire layers within a convolution neural network during training. Consequently, the remaining layers provide the most optimal features for a given task. In this two-step approach, we subsequently leverage this sparsity-induced optimal layer information to build the LOMT models by connecting task-specific decoders to these strategically identified layers, deviating from conventional approaches that uniformly connect decoders at the end of the network. This tailored architecture optimizes the network, focusing on essential features while reducing redundancy. We validate the efficacy of the proposed approach on two datasets, i.e., NYU-v2 and CelebAMask-HD datasets, for multiple heterogeneous tasks. A detailed performance analysis of the LOMT models, in contrast to the conventional MTL models, reveals that the LOMT models outperform for most task combinations. The excellent qualitative and quantitative outcomes highlight the effectiveness of employing structured sparsity for optimal layer (or feature) selection.

LGNov 23, 2021
Sharing to learn and learning to share; Fitting together Meta-Learning, Multi-Task Learning, and Transfer Learning: A meta review

Richa Upadhyay, Ronald Phlypo, Rajkumar Saini et al.

Integrating knowledge across different domains is an essential feature of human learning. Learning paradigms such as transfer learning, meta-learning, and multi-task learning reflect the human learning process by exploiting the prior knowledge for new tasks, encouraging faster learning and good generalization for new tasks. This article gives a detailed view of these learning paradigms and their comparative analysis. The weakness of one learning algorithm turns out to be a strength of another, and thus, merging them is a prevalent trait in the literature. Numerous research papers focus on each of these learning paradigms separately and provide a comprehensive overview of them. However, this article reviews research studies that combine (two of) these learning algorithms. This survey describes how these techniques are combined to solve problems in many different fields of research, including computer vision, natural language processing, hyper-spectral imaging, and many more, in a supervised setting only. Based on the knowledge accumulated from the literature, we hypothesize a generic task-agnostic and model-agnostic learning network - an ensemble of meta-learning, transfer learning, and multi-task learning, termed Multi-modal Multi-task Meta Transfer Learning. We also present some open research questions, limitations, and future research directions for this proposed network. The aim of this article is to spark interest among scholars in effectively merging existing learning algorithms with the intention of advancing research in this field. Instead of presenting experimental results, we invite readers to explore and contemplate techniques for merging algorithms while navigating through their limitations.