Dhruv Makwana

5papers

107citations

Novelty50%

AI Score32

Ranked #138,686 of 201,326 authors (top 69%)#43,422 in CV (top 74%)

5 Papers

CVJul 3, 2022Code

WaferSegClassNet -- A Light-weight Network for Classification and Segmentation of Semiconductor Wafer Defects

Subhrajit Nag, Dhruv Makwana, Sai Chandra Teja R et al.

As the integration density and design intricacy of semiconductor wafers increase, the magnitude and complexity of defects in them are also on the rise. Since the manual inspection of wafer defects is costly, an automated artificial intelligence (AI) based computer-vision approach is highly desired. The previous works on defect analysis have several limitations, such as low accuracy and the need for separate models for classification and segmentation. For analyzing mixed-type defects, some previous works require separately training one model for each defect type, which is non-scalable. In this paper, we present WaferSegClassNet (WSCN), a novel network based on encoder-decoder architecture. WSCN performs simultaneous classification and segmentation of both single and mixed-type wafer defects. WSCN uses a "shared encoder" for classification, and segmentation, which allows training WSCN end-to-end. We use N-pair contrastive loss to first pretrain the encoder and then use BCE-Dice loss for segmentation, and categorical cross-entropy loss for classification. Use of N-pair contrastive loss helps in better embedding representation in the latent dimension of wafer maps. WSCN has a model size of only 0.51MB and performs only 0.2M FLOPS. Thus, it is much lighter than other state-of-the-art models. Also, it requires only 150 epochs for convergence, compared to 4,000 epochs needed by a previous work. We evaluate our model on the MixedWM38 dataset, which has 38,015 images. WSCN achieves an average classification accuracy of 98.2% and a dice coefficient of 0.9999. We are the first to show segmentation results on the MixedWM38 dataset. The source code can be obtained from https://github.com/ckmvigil/WaferSegClassNet.

CVJul 13, 2022Code

ACLNet: An Attention and Clustering-based Cloud Segmentation Network

Dhruv Makwana, Subhrajit Nag, Onkar Susladkar et al.

We propose a novel deep learning model named ACLNet, for cloud segmentation from ground images. ACLNet uses both deep neural network and machine learning (ML) algorithm to extract complementary features. Specifically, it uses EfficientNet-B0 as the backbone, "`a trous spatial pyramid pooling" (ASPP) to learn at multiple receptive fields, and "global attention module" (GAM) to extract finegrained details from the image. ACLNet also uses k-means clustering to extract cloud boundaries more precisely. ACLNet is effective for both daytime and nighttime images. It provides lower error rate, higher recall and higher F1-score than state-of-art cloud segmentation models. The source-code of ACLNet is available here: https://github.com/ckmvigil/ACLNet.

CVOct 26, 2022Code

TPFNet: A Novel Text In-painting Transformer for Text Removal

Onkar Susladkar, Dhruv Makwana, Gayatri Deshmukh et al.

Text erasure from an image is helpful for various tasks such as image editing and privacy preservation. In this paper, we present TPFNet, a novel one-stage (end-toend) network for text removal from images. Our network has two parts: feature synthesis and image generation. Since noise can be more effectively removed from low-resolution images, part 1 operates on low-resolution images. The output of part 1 is a low-resolution text-free image. Part 2 uses the features learned in part 1 to predict a high-resolution text-free image. In part 1, we use "pyramidal vision transformer" (PVT) as the encoder. Further, we use a novel multi-headed decoder that generates a high-pass filtered image and a segmentation map, in addition to a text-free image. The segmentation branch helps locate the text precisely, and the high-pass branch helps in learning the image structure. To precisely locate the text, TPFNet employs an adversarial loss that is conditional on the segmentation map rather than the input image. On Oxford, SCUT, and SCUT-EnsText datasets, our network outperforms recently proposed networks on nearly all the metrics. For example, on SCUT-EnsText dataset, TPFNet has a PSNR (higher is better) of 39.0 and text-detection precision (lower is better) of 21.1, compared to the best previous technique, which has a PSNR of 32.3 and precision of 53.2. The source code can be obtained from https://github.com/CandleLabAI/TPFNet

ROJun 26, 2023

MOVES: Movable and Moving LiDAR Scene Segmentation in Label-Free settings using Static Reconstruction

Prashant Kumar, Dhruv Makwana, Onkar Susladkar et al.

Accurate static structure reconstruction and segmentation of non-stationary objects is of vital importance for autonomous navigation applications. These applications assume a LiDAR scan to consist of only static structures. In the real world however, LiDAR scans consist of non-stationary dynamic structures - moving and movable objects. Current solutions use segmentation information to isolate and remove moving structures from LiDAR scan. This strategy fails in several important use-cases where segmentation information is not available. In such scenarios, moving objects and objects with high uncertainty in their motion i.e. movable objects, may escape detection. This violates the above assumption. We present MOVES, a novel GAN based adversarial model that segments out moving as well as movable objects in the absence of segmentation information. We achieve this by accurately transforming a dynamic LiDAR scan to its corresponding static scan. This is obtained by replacing dynamic objects and corresponding occlusions with static structures which were occluded by dynamic objects. We leverage corresponding static-dynamic LiDAR pairs.

LGAug 17, 2021

RRLFSOR: An Efficient Self-Supervised Learning Strategy of Graph Convolutional Networks

Feng Sun, Ajith Kumar, Guanci Yang et al.

Graph Convolutional Networks (GCNs) are widely used in many applications yet still need large amounts of labelled data for training. Besides, the adjacency matrix of GCNs is stable, which makes the data processing strategy cannot efficiently adjust the quantity of training data from the built graph structures.To further improve the performance and the self-learning ability of GCNs,in this paper, we propose an efficient self-supervised learning strategy of GCNs,named randomly removed links with a fixed step at one region (RRLFSOR).RRLFSOR can be regarded as a new data augmenter to improve over-smoothing.RRLFSOR is examined on two efficient and representative GCN models with three public citation network datasets-Cora,PubMed,and Citeseer.Experiments on transductive link prediction tasks show that our strategy outperforms the baseline models consistently by up to 21.34% in terms of accuracy on three benchmark datasets.