Shubhabrata Mukherjee

CV
h-index22
5papers
9citations
Novelty51%
AI Score30

5 Papers

LGOct 11, 2023
Transformers for Green Semantic Communication: Less Energy, More Semantics

Shubhabrata Mukherjee, Cory Beard, Sejun Song

Semantic communication aims to transmit meaningful and effective information, rather than focusing on individual symbols or bits. This results in benefits like reduced latency, bandwidth usage, and higher throughput compared with traditional communication. However, semantic communication poses significant challenges due to the need for universal metrics to benchmark the joint effects of semantic information loss and practical energy consumption. This research presents a novel multi-objective loss function named "Energy-Optimized Semantic Loss" (EOSL), addressing the challenge of balancing semantic information loss and energy consumption. Through comprehensive experiments on transformer models, including CPU and GPU energy usage, it is demonstrated that EOSL-based encoder model selection can save up to 90% of energy while achieving a 44% improvement in semantic similarity performance during inference in this experiment. This work paves the way for energy-efficient neural network selection and the development of greener semantic communication architectures.

CVFeb 12, 2024
MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO

Shubhabrata Mukherjee, Cory Beard, Zhu Li

Low-light conditions and occluded scenarios impede object detection in real-world Internet of Things (IoT) applications like autonomous vehicles and security systems. While advanced machine learning models strive for accuracy, their computational demands clash with the limitations of resource-constrained devices, hampering real-time performance. In our current research, we tackle this challenge, by introducing ``YOLO Phantom", one of the smallest YOLO models ever conceived. YOLO Phantom utilizes the novel Phantom Convolution block, achieving comparable accuracy to the latest YOLOv8n model while simultaneously reducing both parameters and model size by 43\%, resulting in a significant 19\% reduction in Giga Floating-Point Operations (GFLOPs). YOLO Phantom leverages transfer learning on our multimodal RGB-infrared dataset to address low-light and occlusion issues, equipping it with robust vision under adverse conditions. Its real-world efficacy is demonstrated on an IoT platform with advanced low-light and RGB cameras, seamlessly connecting to an AWS-based notification endpoint for efficient real-time object detection. Benchmarks reveal a substantial boost of 17\% and 14\% in frames per second (FPS) for thermal and RGB detection, respectively, compared to the baseline YOLOv8n model. For community contribution, both the code and the multimodal dataset are available on GitHub.

CVJun 30, 2025
Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data

Shubhabrata Mukherjee, Jack Lang, Obeen Kwon et al.

Zero-shot and prompt-based models have excelled at visual reasoning tasks by leveraging large-scale natural image corpora, but they often fail on sparse and domain-specific scientific image data. We introduce Zenesis, a no-code interactive computer vision platform designed to reduce data readiness bottlenecks in scientific imaging workflows. Zenesis integrates lightweight multimodal adaptation for zero-shot inference on raw scientific data, human-in-the-loop refinement, and heuristic-based temporal enhancement. We validate our approach on Focused Ion Beam Scanning Electron Microscopy (FIB-SEM) datasets of catalyst-loaded membranes. Zenesis outperforms baselines, achieving an average accuracy of 0.947, Intersection over Union (IoU) of 0.858, and Dice score of 0.923 on amorphous catalyst samples; and 0.987 accuracy, 0.857 IoU, and 0.923 Dice on crystalline samples. These results represent a significant performance gain over conventional methods such as Otsu thresholding and standalone models like the Segment Anything Model (SAM). Zenesis enables effective image segmentation in domains where annotated datasets are limited, offering a scalable solution for scientific discovery.

LGJun 22, 2024
MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication

Shubhabrata Mukherjee, Cory Beard, Sejun Song

Semantic Communication can transform the way we transmit information, prioritizing meaningful and effective content over individual symbols or bits. This evolution promises significant benefits, including reduced latency, lower bandwidth usage, and higher throughput compared to traditional communication. However, the development of Semantic Communication faces a crucial challenge: the need for universal metrics to benchmark the joint effects of semantic information loss and energy consumption. This research introduces an innovative solution: the ``Energy-Optimized Semantic Loss'' (EOSL) function, a novel multi-objective loss function that effectively balances semantic information loss and energy consumption. Through comprehensive experiments on transformer models, including energy benchmarking, we demonstrate the remarkable effectiveness of EOSL-based model selection. We have established that EOSL-based transformer model selection achieves up to 83\% better similarity-to-power ratio (SPR) compared to BLEU score-based selection and 67\% better SPR compared to solely lowest power usage-based selection. Furthermore, we extend the applicability of EOSL to diverse and varying contexts, inspired by the principles of Meta-Learning. By cumulatively applying EOSL, we enable the model selection system to adapt to this change, leveraging historical EOSL values to guide the learning process. This work lays the foundation for energy-efficient model selection and the development of green semantic communication.

CVApr 8, 2024
A secure and private ensemble matcher using multi-vault obfuscated templates

Babak Poorebrahim Gilkalaye, Shubhabrata Mukherjee, Reza Derakhshani

Generative AI has revolutionized modern machine learning by providing unprecedented realism, diversity, and efficiency in data generation. This technology holds immense potential for biometrics, including for securing sensitive and personally identifiable information. Given the irrevocability of biometric samples and mounting privacy concerns, biometric template security and secure matching are among the most sought-after features of modern biometric systems. This paper proposes a novel obfuscation method using Generative AI to enhance biometric template security. Our approach utilizes synthetic facial images generated by a Generative Adversarial Network (GAN) as "random chaff points" within a secure vault system. Our method creates n sub-templates from the original template, each obfuscated with m GAN chaff points. During verification, s closest vectors to the biometric query are retrieved from each vault and combined to generate hash values, which are then compared with the stored hash value. Thus, our method safeguards user identities during the training and deployment phases by employing the GAN-generated synthetic images. Our protocol was tested using the AT&T, GT, and LFW face datasets, achieving ROC areas under the curve of 0.99, 0.99, and 0.90, respectively. Our results demonstrate that the proposed method can maintain high accuracy and reasonable computational complexity comparable to those unprotected template methods while significantly enhancing security and privacy, underscoring the potential of Generative AI in developing proactive defensive strategies for biometric systems.