Ujjal Kr Dutta

CV
h-index6
13papers
58citations
Novelty45%
AI Score28

13 Papers

LGAug 30, 2023
Application of Zone Method based Physics-Informed Neural Networks in Reheating Furnaces

Ujjal Kr Dutta, Aldo Lipani, Chuan Wang et al.

Foundation Industries (FIs) constitute glass, metals, cement, ceramics, bulk chemicals, paper, steel, etc. and provide crucial, foundational materials for a diverse set of economically relevant industries: automobiles, machinery, construction, household appliances, chemicals, etc. Reheating furnaces within the manufacturing chain of FIs are energy-intensive. Accurate and real-time prediction of underlying temperatures in reheating furnaces has the potential to reduce the overall heating time, thereby controlling the energy consumption for achieving the Net-Zero goals in FIs. In this paper, we cast this prediction as a regression task and explore neural networks due to their inherent capability of being effective and efficient, given adequate data. However, due to the infeasibility of achieving good-quality real data in scenarios like reheating furnaces, classical Hottel's zone method based computational model has been used to generate data for model training. To further enhance the Out-Of-Distribution generalization capability of the trained model, we propose a Physics-Informed Neural Network (PINN) by incorporating prior physical knowledge using a set of novel Energy-Balance regularizers.

CVMar 21, 2022
Multispectral Satellite Data Classification using Soft Computing Approach

Purbarag Pathak Choudhury, Ujjal Kr Dutta, Dhruba Kr Bhattacharyya

A satellite image is a remotely sensed image data, where each pixel represents a specific location on earth. The pixel value recorded is the reflection radiation from the earth's surface at that location. Multispectral images are those that capture image data at specific frequencies across the electromagnetic spectrum as compared to Panchromatic images which are sensitive to all wavelength of visible light. Because of the high resolution and high dimensions of these images, they create difficulties for clustering techniques to efficiently detect clusters of different sizes, shapes and densities as a trade off for fast processing time. In this paper we propose a grid-density based clustering technique for identification of objects. We also introduce an approach to classify a satellite image data using a rule induction based machine learning algorithm. The object identification and classification methods have been validated using several synthetic and benchmark datasets.

CVAug 20, 2022
Fuse and Attend: Generalized Embedding Learning for Art and Sketches

Ujjal Kr Dutta

While deep Embedding Learning approaches have witnessed widespread success in multiple computer vision tasks, the state-of-the-art methods for representing natural images need not necessarily perform well on images from other domains, such as paintings, cartoons, and sketch. This is because of the huge shift in the distribution of data from across these domains, as compared to natural images. Domains like sketch often contain sparse informative pixels. However, recognizing objects in such domains is crucial, given multiple relevant applications leveraging such data, for instance, sketch to image retrieval. Thus, achieving an Embedding Learning model that could perform well across multiple domains is not only challenging, but plays a pivotal role in computer vision. To this end, in this paper, we propose a novel Embedding Learning approach with the goal of generalizing across different domains. During training, given a query image from a domain, we employ gated fusion and attention to generate a positive example, which carries a broad notion of the semantics of the query object category (from across multiple domains). By virtue of Contrastive Learning, we pull the embeddings of the query and positive, in order to learn a representation which is robust across domains. At the same time, to teach the model to be discriminative against examples from different semantic categories (across domains), we also maintain a pool of negative embeddings (from different categories). We show the prowess of our method using the DomainBed framework, on the popular PACS (Photo, Art painting, Cartoon, and Sketch) dataset.

CVMar 14, 2025
Mitigating Bad Ground Truth in Supervised Machine Learning based Crop Classification: A Multi-Level Framework with Sentinel-2 Images

Sanayya A, Amoolya Shetty, Abhijeet Sharma et al.

In agricultural management, precise Ground Truth (GT) data is crucial for accurate Machine Learning (ML) based crop classification. Yet, issues like crop mislabeling and incorrect land identification are common. We propose a multi-level GT cleaning framework while utilizing multi-temporal Sentinel-2 data to address these issues. Specifically, this framework utilizes generating embeddings for farmland, clustering similar crop profiles, and identification of outliers indicating GT errors. We validated clusters with False Colour Composite (FCC) checks and used distance-based metrics to scale and automate this verification process. The importance of cleaning the GT data became apparent when the models were trained on the clean and unclean data. For instance, when we trained a Random Forest model with the clean GT data, we achieved upto 70\% absolute percentage points higher for the F1 score metric. This approach advances crop classification methodologies, with potential for applications towards improving loan underwriting and agricultural decision-making.

CVDec 6, 2021
A Tale of Color Variants: Representation and Self-Supervised Learning in Fashion E-Commerce

Ujjal Kr Dutta, Sandeep Repakula, Maulik Parmar et al.

In this paper, we address a crucial problem in fashion e-commerce (with respect to customer experience, as well as revenue): color variants identification, i.e., identifying fashion products that match exactly in their design (or style), but only to differ in their color. We propose a generic framework, that leverages deep visual Representation Learning at its heart, to address this problem for our fashion e-commerce platform. Our framework could be trained with supervisory signals in the form of triplets, that are obtained manually. However, it is infeasible to obtain manual annotations for the entire huge collection of data usually present in fashion e-commerce platforms, such as ours, while capturing all the difficult corner cases. But, to our rescue, interestingly we observed that this crucial problem in fashion e-commerce could also be solved by simple color jitter based image augmentation, that recently became widely popular in the contrastive Self-Supervised Learning (SSL) literature, that seeks to learn visual representations without using manual labels. This naturally led to a question in our mind: Could we leverage SSL in our use-case, and still obtain comparable performance to our supervised framework? The answer is, Yes! because, color variant fashion objects are nothing but manifestations of a style, in different colors, and a model trained to be invariant to the color (with, or without supervision), should be able to recognize this! This is what the paper further demonstrates, both qualitatively, and quantitatively, while evaluating a couple of state-of-the-art SSL techniques, and also proposing a novel method.

CVDec 6, 2021
Seeing Objects in dark with Continual Contrastive Learning

Ujjal Kr Dutta

Object Detection, a fundamental computer vision problem, has paramount importance in smart camera systems. However, a truly reliable camera system could be achieved if and only if the underlying object detection component is robust enough across varying imaging conditions (or domains), for instance, different times of the day, adverse weather conditions, etc. In an effort to achieving a reliable camera system, in this paper, we make an attempt to train such a robust detector. Unfortunately, to build a well-performing detector across varying imaging conditions, one would require labeled training images (often in large numbers) from a plethora of corner cases. As manually obtaining such a large labeled dataset may be infeasible, we suggest using synthetic images, to mimic different training image domains. We propose a novel, contrastive learning method to align the latent representations of a pair of real and synthetic images, to make the detector robust to the different domains. However, we found that merely contrasting the embeddings may lead to catastrophic forgetting of the information essential for object detection. Hence, we employ a continual learning based penalty, to alleviate the issue of forgetting, while contrasting the representations. We showcase that our proposed method outperforms a wide range of alternatives to address the extremely challenging, yet under-studied scenario of object detection at night-time.

LGMay 10, 2021
Semi-Supervised Metric Learning: A Deep Resurrection

Ujjal Kr Dutta, Mehrtash Harandi, Chellu Chandra Sekhar

Distance Metric Learning (DML) seeks to learn a discriminative embedding where similar examples are closer, and dissimilar examples are apart. In this paper, we address the problem of Semi-Supervised DML (SSDML) that tries to learn a metric using a few labeled examples, and abundantly available unlabeled examples. SSDML is important because it is infeasible to manually annotate all the examples present in a large dataset. Surprisingly, with the exception of a few classical approaches that learn a linear Mahalanobis metric, SSDML has not been studied in the recent years, and lacks approaches in the deep SSDML scenario. In this paper, we address this challenging problem, and revamp SSDML with respect to deep learning. In particular, we propose a stochastic, graph-based approach that first propagates the affinities between the pairs of examples from labeled data, to that of the unlabeled pairs. The propagated affinities are used to mine triplet based constraints for metric learning. We impose orthogonality constraint on the metric parameters, as it leads to a better performance by avoiding a model collapse.

CVApr 17, 2021
Color Variants Identification in Fashion e-commerce via Contrastive Self-Supervised Representation Learning

Ujjal Kr Dutta, Sandeep Repakula, Maulik Parmar et al.

In this paper, we utilize deep visual Representation Learning to address an important problem in fashion e-commerce: color variants identification, i.e., identifying fashion products that match exactly in their design (or style), but only to differ in their color. At first we attempt to tackle the problem by obtaining manual annotations (depicting whether two products are color variants), and train a supervised triplet loss based neural network model to learn representations of fashion products. However, for large scale real-world industrial datasets such as addressed in our paper, it is infeasible to obtain annotations for the entire dataset, while capturing all the difficult corner cases. Interestingly, we observed that color variants are essentially manifestations of color jitter based augmentations. Thus, we instead explore Self-Supervised Learning (SSL) to solve this problem. We observed that existing state-of-the-art SSL methods perform poor, for our problem. To address this, we propose a novel SSL based color variants model that simultaneously focuses on different parts of an apparel. Quantitative and qualitative evaluation shows that our method outperforms existing SSL methods, and at times, the supervised model.

CVAug 26, 2020
Attr2Style: A Transfer Learning Approach for Inferring Fashion Styles via Apparel Attributes

Rajdeep Hazra Banerjee, Abhinav Ravi, Ujjal Kr Dutta

Popular fashion e-commerce platforms mostly provide details about low-level attributes of an apparel (eg, neck type, dress length, collar type) on their product detail pages. However, customers usually prefer to buy apparel based on their style information, or simply put, occasion (eg, party/ sports/ casual wear). Application of a supervised image-captioning model to generate style-based image captions is limited because obtaining ground-truth annotations in the form of style-based captions is difficult. This is because annotating style-based captions requires a certain amount of fashion domain expertise, and also adds to the costs and manual effort. On the contrary, low-level attribute based annotations are much more easily available. To address this issue, we propose a transfer-learning based image captioning model that is trained on a source dataset with sufficient attribute-based ground-truth captions, and used to predict style-based captions on a target dataset. The target dataset has only a limited amount of images with style-based ground-truth captions. The main motivation of our approach comes from the fact that most often there are correlations among the low-level attributes and the higher-level styles for an apparel. We leverage this fact and train our model in an encoder-decoder based framework using attention mechanism. In particular, the encoder of the model is first trained on the source dataset to obtain latent representations capturing the low-level attributes. The trained model is fine-tuned to generate style-based captions for the target dataset. To highlight the effectiveness of our method, we qualitatively and quantitatively demonstrate that the captions generated by our approach are close to the actual style information for the evaluated apparel. A Proof Of Concept for our model is under pilot at Myntra where it is exposed to some internal users for feedback.

CVAug 26, 2020
Buy Me That Look: An Approach for Recommending Similar Fashion Products

Abhinav Ravi, Sandeep Repakula, Ujjal Kr Dutta et al.

Have you ever looked at an Instagram model, or a model in a fashion e-commerce web-page, and thought \textit{"Wish I could get a list of fashion items similar to the ones worn by the model!"}. This is what we address in this paper, where we propose a novel computer vision based technique called \textbf{ShopLook} to address the challenging problem of recommending similar fashion products. The proposed method has been evaluated at Myntra (www.myntra.com), a leading online fashion e-commerce platform. In particular, given a user query and the corresponding Product Display Page (PDP) against the query, the goal of our method is to recommend similar fashion products corresponding to the entire set of fashion articles worn by a model in the PDP full-shot image (the one showing the entire model from head to toe). The novelty and strength of our method lies in its capability to recommend similar articles for all the fashion items worn by the model, in addition to the primary article corresponding to the query. This is not only important to promote cross-sells for boosting revenue, but also for improving customer experience and engagement. In addition, our approach is also capable of recommending similar products for User Generated Content (UGC), eg., fashion article images uploaded by users. Formally, our proposed method consists of the following components (in the same order): i) Human keypoint detection, ii) Pose classification, iii) Article localisation and object detection, along with active learning feedback, and iv) Triplet network based image embedding model.

CVAug 22, 2020
Unsupervised Deep Metric Learning via Orthogonality based Probabilistic Loss

Ujjal Kr Dutta, Mehrtash Harandi, Chellu Chandra Sekhar

Metric learning is an important problem in machine learning. It aims to group similar examples together. Existing state-of-the-art metric learning approaches require class labels to learn a metric. As obtaining class labels in all applications is not feasible, we propose an unsupervised approach that learns a metric without making use of class labels. The lack of class labels is compensated by obtaining pseudo-labels of data using a graph-based clustering approach. The pseudo-labels are used to form triplets of examples, which guide the metric learning. We propose a probabilistic loss that minimizes the chances of each triplet violating an angular constraint. A weight function, and an orthogonality constraint in the objective speeds up the convergence and avoids a model collapse. We also provide a stochastic formulation of our method to scale up to large-scale datasets. Our studies demonstrate the competitiveness of our approach against state-of-the-art methods. We also thoroughly study the effect of the different components of our method.

CVFeb 27, 2020
Affinity guided Geometric Semi-Supervised Metric Learning

Ujjal Kr Dutta, Mehrtash Harandi, Chellu Chandra Sekhar

In this paper, we revamp the forgotten classical Semi-Supervised Distance Metric Learning (SSDML) problem from a Riemannian geometric lens, to leverage stochastic optimization within a end-to-end deep framework. The motivation comes from the fact that apart from a few classical SSDML approaches learning a linear Mahalanobis metric, deep SSDML has not been studied. We first extend existing SSDML methods to their deep counterparts and then propose a new method to overcome their limitations. Due to the nature of constraints on our metric parameters, we leverage Riemannian optimization. Our deep SSDML method with a novel affinity propagation based triplet mining strategy outperforms its competitors.

CVDec 17, 2019
A Probabilistic approach for Learning Embeddings without Supervision

Ujjal Kr Dutta, Mehrtash Harandi, Chandra Sekhar Chellu

For challenging machine learning problems such as zero-shot learning and fine-grained categorization, embedding learning is the machinery of choice because of its ability to learn generic notions of similarity, as opposed to class-specific concepts in standard classification models. Embedding learning aims at learning discriminative representations of data such that similar examples are pulled closer, while pushing away dissimilar ones. Despite their exemplary performances, supervised embedding learning approaches require huge number of annotations for training. This restricts their applicability for large datasets in new applications where obtaining labels require extensive manual efforts and domain knowledge. In this paper, we propose to learn an embedding in a completely unsupervised manner without using any class labels. Using a graph-based clustering approach to obtain pseudo-labels, we form triplet-based constraints following a metric learning paradigm. Our novel embedding learning approach uses a probabilistic notion, that intuitively minimizes the chances of each triplet violating a geometric constraint. Due to nature of the search space, we learn the parameters of our approach using Riemannian geometry. Our proposed approach performs competitive to state-of-the-art approaches.