CVAug 25, 2025Code
DemoBias: An Empirical Study to Trace Demographic Biases in Vision Foundation ModelsAbu Sufian, Anirudha Ghosh, Debaditya Barman et al.
Large Vision Language Models (LVLMs) have demonstrated remarkable capabilities across various downstream tasks, including biometric face recognition (FR) with description. However, demographic biases remain a critical concern in FR, as these foundation models often fail to perform equitably across diverse demographic groups, considering ethnicity/race, gender, and age. Therefore, through our work DemoBias, we conduct an empirical evaluation to investigate the extent of demographic biases in LVLMs for biometric FR with textual token generation tasks. We fine-tuned and evaluated three widely used pre-trained LVLMs: LLaVA, BLIP-2, and PaliGemma on our own generated demographic-balanced dataset. We utilize several evaluation metrics, like group-specific BERTScores and the Fairness Discrepancy Rate, to quantify and trace the performance disparities. The experimental results deliver compelling insights into the fairness and reliability of LVLMs across diverse demographic groups. Our empirical study uncovered demographic biases in LVLMs, with PaliGemma and LLaVA exhibiting higher disparities for Hispanic/Latino, Caucasian, and South Asian groups, whereas BLIP-2 demonstrated comparably consistent. Repository: https://github.com/Sufianlab/DemoBias.
CVJun 3, 2025Code
Can Vision Transformers with ResNet's Global Features Fairly Authenticate Demographic Faces?Abu Sufian, Marco Leo, Cosimo Distante et al.
Biometric face authentication is crucial in computer vision, but ensuring fairness and generalization across demographic groups remains a big challenge. Therefore, we investigated whether Vision Transformer (ViT) and ResNet, leveraging pre-trained global features, can fairly authenticate different demographic faces while relying minimally on local features. In this investigation, we used three pre-trained state-of-the-art (SOTA) ViT foundation models from Facebook, Google, and Microsoft for global features as well as ResNet-18. We concatenated the features from ViT and ResNet, passed them through two fully connected layers, and trained on customized face image datasets to capture the local features. Then, we designed a novel few-shot prototype network with backbone features embedding. We also developed new demographic face image support and query datasets for this empirical study. The network's testing was conducted on this dataset in one-shot, three-shot, and five-shot scenarios to assess how performance improves as the size of the support set increases. We observed results across datasets with varying races/ethnicities, genders, and age groups. The Microsoft Swin Transformer backbone performed better among the three SOTA ViT for this task. The code and data are available at: https://github.com/Sufianlab/FairVitBio.
CVJan 1
Context-Aware Pesticide Recommendation via Few-Shot Pest Recognition for Precision AgricultureAnirudha Ghosh, Ritam Sarkar, Debaditya Barman
Effective pest management is crucial for enhancing agricultural productivity, especially for crops such as sugarcane and wheat that are highly vulnerable to pest infestations. Traditional pest management methods depend heavily on manual field inspections and the use of chemical pesticides. These approaches are often costly, time-consuming, labor-intensive, and can have a negative impact on the environment. To overcome these challenges, this study presents a lightweight framework for pest detection and pesticide recommendation, designed for low-resource devices such as smartphones and drones, making it suitable for use by small and marginal farmers. The proposed framework includes two main components. The first is a Pest Detection Module that uses a compact, lightweight convolutional neural network (CNN) combined with prototypical meta-learning to accurately identify pests even when only a few training samples are available. The second is a Pesticide Recommendation Module that incorporates environmental factors like crop type and growth stage to suggest safe and eco-friendly pesticide recommendations. To train and evaluate our framework, a comprehensive pest image dataset was developed by combining multiple publicly available datasets. The final dataset contains samples with different viewing angles, pest sizes, and background conditions to ensure strong generalization. Experimental results show that the proposed lightweight CNN achieves high accuracy, comparable to state-of-the-art models, while significantly reducing computational complexity. The Decision Support System additionally improves pest management by reducing dependence on traditional chemical pesticides and encouraging sustainable practices, demonstrating its potential for real-time applications in precision agriculture.
CVJun 10, 2019Code
BDNet: Bengali Handwritten Numeral Digit Recognition based on Densely connected Convolutional Neural NetworksA. Sufian, Anirudha Ghosh, Avijit Naskar et al.
Images of handwritten digits are different from natural images as the orientation of a digit, as well as similarity of features of different digits, makes confusion. On the other hand, deep convolutional neural networks are achieving huge success in computer vision problems, especially in image classification. BDNet is a densely connected deep convolutional neural network model used to classify (recognize) Bengali handwritten numeral digits. It is end-to-end trained using ISI Bengali handwritten numeral dataset. During training, untraditional data preprocessing and augmentation techniques are used so that the trained model works on a different dataset. The model has achieved the test accuracy of 99.775%(baseline was 99.40%) on the test dataset of ISI Bengali handwritten numerals. So, the BDNet model gives 62.5% error reduction compared to previous state-of-the-art models. Here we have also created a dataset of 1000 images of Bengali handwritten numerals to test the trained model, and it giving promising results. Codes, trained model and our own dataset are available at: {https://github.com/Sufianlab/BDNet}.