Yogesh Aggarwal

h-index3
2papers

2 Papers

CVMar 3, 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language

Pankaj Choudhury, Yogesh Aggarwal, Prabhanjan Jadhav et al.

Most existing works in image caption synthesis use computation heavy deep neural networks and generates image descriptions in English language. This often restricts this important assistive tool for widespread use across language and accessibility barriers. This work presents AC-Lite, a computationally efficient model for image captioning in low-resource Assamese language. AC-Lite reduces computational requirements by replacing computation-heavy deep network components with lightweight alternatives. The AC-Lite model is designed through extensive ablation experiments with different image feature extractor networks and language decoders. A combination of ShuffleNetv2x1.5 with GRU based language decoder along with bilinear attention is found to provide the best performance with minimum compute. AC-Lite was observed to achieve an 82.3 CIDEr score on the COCO-AC dataset with 2.45 GFLOPs and 22.87M parameters.

CVJun 27, 2024
FDLite: A Single Stage Lightweight Face Detector Network

Yogesh Aggarwal, Prithwijit Guha

Face detection is frequently attempted by using heavy pre-trained backbone networks like ResNet-50/101/152 and VGG16/19. Few recent works have also proposed lightweight detectors with customized backbones, novel loss functions and efficient training strategies. The novelty of this work lies in the design of a lightweight detector while training with only the commonly used loss functions and learning strategies. The proposed face detector grossly follows the established RetinaFace architecture. The first contribution of this work is the design of a customized lightweight backbone network (BLite) having 0.167M parameters with 0.52 GFLOPs. The second contribution is the use of two independent multi-task losses. The proposed lightweight face detector (FDLite) has 0.26M parameters with 0.94 GFLOPs. The network is trained on the WIDER FACE dataset. FDLite is observed to achieve 92.3\%, 89.8\%, and 82.2\% Average Precision (AP) on the easy, medium, and hard subsets of the WIDER FACE validation dataset, respectively.