LGNov 13, 2025Code
T2IBias: Uncovering Societal Bias Encoded in the Latent Space of Text-to-Image Generative ModelsAbu Sufian, Cosimo Distante, Marco Leo et al.
Text-to-image (T2I) generative models are largely used in AI-powered real-world applications and value creation. However, their strategic deployment raises critical concerns for responsible AI management, particularly regarding the reproduction and amplification of race- and gender-related stereotypes that can undermine organizational ethics. In this work, we investigate whether such societal biases are systematically encoded within the pretrained latent spaces of state-of-the-art T2I models. We conduct an empirical study across the five most popular open-source models, using ten neutral, profession-related prompts to generate 100 images per profession, resulting in a dataset of 5,000 images evaluated by diverse human assessors representing different races and genders. We demonstrate that all five models encode and amplify pronounced societal skew: caregiving and nursing roles are consistently feminized, while high-status professions such as corporate CEO, politician, doctor, and lawyer are overwhelmingly represented by males and mostly White individuals. We further identify model-specific patterns, such as QWEN-Image's near-exclusive focus on East Asian outputs, Kandinsky's dominance of White individuals, and SDXL's comparatively broader but still biased distributions. These results provide critical insights for AI project managers and practitioners, enabling them to select equitable AI models and customized prompts that generate images in alignment with the principles of responsible AI. We conclude by discussing the risks of these biases and proposing actionable strategies for bias mitigation in building responsible GenAI systems. The code and Data Repository: https://github.com/Sufianlab/T2IBias
CVJul 22, 2022
Vision-based Human Fall Detection Systems using Deep Learning: A ReviewEkram Alam, Abu Sufian, Paramartha Dutta et al.
Human fall is one of the very critical health issues, especially for elders and disabled people living alone. The number of elder populations is increasing steadily worldwide. Therefore, human fall detection is becoming an effective technique for assistive living for those people. For assistive living, deep learning and computer vision have been used largely. In this review article, we discuss deep learning (DL)-based state-of-the-art non-intrusive (vision-based) fall detection techniques. We also present a survey on fall detection benchmark datasets. For a clear understanding, we briefly discuss different metrics which are used to evaluate the performance of the fall detection systems. This article also gives a future direction on vision-based human fall detection techniques.
CVMay 31, 2025Code
Human Fall Detection using Transfer Learning-based 3D CNNEkram Alam, Abu Sufian, Paramartha Dutta et al.
Unintentional or accidental falls are one of the significant health issues in senior persons. The population of senior persons is increasing steadily. So, there is a need for an automated fall detection monitoring system. This paper introduces a vision-based fall detection system using a pre-trained 3D CNN. Unlike 2D CNN, 3D CNN extracts not only spatial but also temporal features. The proposed model leverages the original learned weights of a 3D CNN model pre-trained on the Sports1M dataset to extract the spatio-temporal features. Only the SVM classifier was trained, which saves the time required to train the 3D CNN. Stratified shuffle five split cross-validation has been used to split the dataset into training and testing data. Extracted features from the proposed 3D CNN model were fed to an SVM classifier to classify the activity as fall or ADL. Two datasets, GMDCSA and CAUCAFall, were utilized to conduct the experiment. The source code for this work can be accessed via the following link: https://github.com/ekramalam/HFD_3DCNN.
CVAug 25, 2025Code
DemoBias: An Empirical Study to Trace Demographic Biases in Vision Foundation ModelsAbu Sufian, Anirudha Ghosh, Debaditya Barman et al.
Large Vision Language Models (LVLMs) have demonstrated remarkable capabilities across various downstream tasks, including biometric face recognition (FR) with description. However, demographic biases remain a critical concern in FR, as these foundation models often fail to perform equitably across diverse demographic groups, considering ethnicity/race, gender, and age. Therefore, through our work DemoBias, we conduct an empirical evaluation to investigate the extent of demographic biases in LVLMs for biometric FR with textual token generation tasks. We fine-tuned and evaluated three widely used pre-trained LVLMs: LLaVA, BLIP-2, and PaliGemma on our own generated demographic-balanced dataset. We utilize several evaluation metrics, like group-specific BERTScores and the Fairness Discrepancy Rate, to quantify and trace the performance disparities. The experimental results deliver compelling insights into the fairness and reliability of LVLMs across diverse demographic groups. Our empirical study uncovered demographic biases in LVLMs, with PaliGemma and LLaVA exhibiting higher disparities for Hispanic/Latino, Caucasian, and South Asian groups, whereas BLIP-2 demonstrated comparably consistent. Repository: https://github.com/Sufianlab/DemoBias.
CVJun 3, 2025Code
Can Vision Transformers with ResNet's Global Features Fairly Authenticate Demographic Faces?Abu Sufian, Marco Leo, Cosimo Distante et al.
Biometric face authentication is crucial in computer vision, but ensuring fairness and generalization across demographic groups remains a big challenge. Therefore, we investigated whether Vision Transformer (ViT) and ResNet, leveraging pre-trained global features, can fairly authenticate different demographic faces while relying minimally on local features. In this investigation, we used three pre-trained state-of-the-art (SOTA) ViT foundation models from Facebook, Google, and Microsoft for global features as well as ResNet-18. We concatenated the features from ViT and ResNet, passed them through two fully connected layers, and trained on customized face image datasets to capture the local features. Then, we designed a novel few-shot prototype network with backbone features embedding. We also developed new demographic face image support and query datasets for this empirical study. The network's testing was conducted on this dataset in one-shot, three-shot, and five-shot scenarios to assess how performance improves as the size of the support set increases. We observed results across datasets with varying races/ethnicities, genders, and age groups. The Microsoft Swin Transformer backbone performed better among the three SOTA ViT for this task. The code and data are available at: https://github.com/Sufianlab/FairVitBio.
CVJan 3, 2024
Real-Time Human Fall Detection using a Lightweight Pose Estimation TechniqueEkram Alam, Abu Sufian, Paramartha Dutta et al.
The elderly population is increasing rapidly around the world. There are no enough caretakers for them. Use of AI-based in-home medical care systems is gaining momentum due to this. Human fall detection is one of the most important tasks of medical care system for the aged people. Human fall is a common problem among elderly people. Detection of a fall and providing medical help as early as possible is very important to reduce any further complexity. The chances of death and other medical complications can be reduced by detecting and providing medical help as early as possible after the fall. There are many state-of-the-art fall detection techniques available these days, but the majority of them need very high computing power. In this paper, we proposed a lightweight and fast human fall detection system using pose estimation. We used `Movenet' for human joins key-points extraction. Our proposed method can work in real-time on any low-computing device with any basic camera. All computation can be processed locally, so there is no problem of privacy of the subject. We used two datasets `GMDCSA' and `URFD' for the experiment. We got the sensitivity value of 0.9375 and 0.9167 for the dataset `GMDCSA' and `URFD' respectively. The source code and the dataset GMDCSA of our work are available online to access.
CVApr 28, 2021
A Deep Transfer Learning-based Edge Computing Method for Home Health MonitoringAbu Sufian, Changsheng You, Mianxiong Dong
The health-care gets huge stress in a pandemic or epidemic situation. Some diseases such as COVID-19 that causes a pandemic is highly spreadable from an infected person to others. Therefore, providing health services at home for non-critical infected patients with isolation shall assist to mitigate this kind of stress. In addition, this practice is also very useful for monitoring the health-related activities of elders who live at home. The home health monitoring, a continuous monitoring of a patient or elder at home using visual sensors is one such non-intrusive sub-area of health services at home. In this article, we propose a transfer learning-based edge computing method for home health monitoring. Specifically, a pre-trained convolutional neural network-based model can leverage edge devices with a small amount of ground-labeled data and fine-tuning method to train the model. Therefore, on-site computing of visual data captured by RGB, depth, or thermal sensor could be possible in an affordable way. As a result, raw data captured by these types of sensors is not required to be sent outside from home. Therefore, privacy, security, and bandwidth scarcity shall not be issues. Moreover, real-time computing for the above-mentioned purposes shall be possible in an economical way.
NIMay 17, 2020
Reinforcement Learning based Transmission Range Control in Software-Defined Wireless Sensor Networks with Moving SensorAnuradha Banerjee, Abu Sufian, Ali Safaa Sadiq et al.
Routing in Software-Defined Wireless sensor networks (SD-WSNs) can be either single or multi-hop, whereas the network is either static or dynamic. In static SD-WSN, the selection of the optimum route from source to destination is accomplished by the SDN controller(s). On the other hand, if moving sensors are there, then SDN controllers of zones cannot handle route discovery sessions by themselves; they can only store information about the most recent zone state. Moving sensors find lots of robotics applications where robots continue to move from one room to another to sensing the environment. A huge amount of energy can be saved in these networks if transmission range control is applied. Multiple power levels exist in each node, and each of these levels takes possible actions after a potential sender node decides to transmit/forward a message. Based on each such activity, the next states of the concerned sender node and the communication session are re-determined while the router receives a reward. The Epsilon-greedy algorithm is applied in this study to decide the optimum power level in the next iteration. It is determined anew depending upon the present network scenario. Simulation results show that our proposed work leads the network to equilibrium by reducing energy consumption and maintaining network throughput.
CVJan 13, 2020
Evolution of Image Segmentation using Deep Convolutional Neural Network: A SurveyFarhana Sultana, Abu Sufian, Paramartha Dutta
From the autonomous car driving to medical diagnosis, the requirement of the task of image segmentation is everywhere. Segmentation of an image is one of the indispensable tasks in computer vision. This task is comparatively complicated than other vision tasks as it needs low-level spatial information. Basically, image segmentation can be of two types: semantic segmentation and instance segmentation. The combined version of these two basic tasks is known as panoptic segmentation. In the recent era, the success of deep convolutional neural networks (CNN) has influenced the field of segmentation greatly and gave us various successful models to date. In this survey, we are going to take a glance at the evolution of both semantic and instance segmentation work based on CNN. We have also specified comparative architectural details of some state-of-the-art models and discuss their training details to present a lucid understanding of hyper-parameter tuning of those models. We have also drawn a comparison among the performance of those models on different datasets. Lastly, we have given a glimpse of some state-of-the-art panoptic segmentation models.