LGNov 27, 2023Code
Can Out-of-Domain data help to Learn Domain-Specific Prompts for Multimodal Misinformation Detection?Amartya Bhattacharya, Debarshi Brahma, Suraj Nagaje Mahadev et al.
Spread of fake news using out-of-context images and captions has become widespread in this era of information overload. Since fake news can belong to different domains like politics, sports, etc. with their unique characteristics, inference on a test image-caption pair is contingent on how well the model has been trained on similar data. Since training individual models for each domain is not practical, we propose a novel framework termed DPOD (Domain-specific Prompt tuning using Out-of-domain data), which can exploit out-of-domain data during training to improve fake news detection of all desired domains simultaneously. First, to compute generalizable features, we modify the Vision-Language Model, CLIP to extract features that helps to align the representations of the images and corresponding captions of both the in-domain and out-of-domain data in a label-aware manner. Further, we propose a domain-specific prompt learning technique which leverages training samples of all the available domains based on the extent they can be useful to the desired domain. Extensive experiments on the large-scale NewsCLIPpings and VERITE benchmarks demonstrate that DPOD achieves state of-the-art performance for this challenging task. Code: https://github.com/scviab/DPOD.
19.8CVMar 28Code
Inference-Time Structural Reasoning for Compositional Vision-Language UnderstandingAmartya Bhattacharya
Vision-language models (VLMs) excel at image-text retrieval yet persistently fail at compositional reasoning, distinguishing captions that share the same words but differ in relational structure. We present, a unified evaluation and augmentation framework benchmarking four architecturally diverse VLMs,CLIP, BLIP, LLaVA, and Qwen3-VL-8B-Thinking,on the Winoground benchmark under plain and scene-graph-augmented regimes. We introduce a dependency-based TextSceneGraphParser (spaCy) extracting subject-relation-object triples, and a Graph Asymmetry Scorer using optimal bipartite matching to inject structural relational priors. Caption ablation experiments (subject-object masking and swapping) reveal that Qwen3-VL-8B-Thinking achieves a group score of 62.75, far above all encoder-based models, while a proposed multi-turn SG filtering strategy further lifts it to 66.0, surpassing prior open-source state-of-the-art. We analyze the capability augmentation tradeoff and find that SG augmentation benefits already capable models while providing negligible or negative gains for weaker baselines. Code: https://github.com/amartyacodes/Inference-Time-Structural-Reasoning-for-Compositional-Vision-Language-Understanding
LGApr 22, 2022
Application of Federated Learning in Building a Robust COVID-19 Chest X-ray Classification ModelAmartya Bhattacharya, Manish Gawali, Jitesh Seth et al.
While developing artificial intelligence (AI)-based algorithms to solve problems, the amount of data plays a pivotal role - large amount of data helps the researchers and engineers to develop robust AI algorithms. In the case of building AI-based models for problems related to medical imaging, these data need to be transferred from the medical institutions where they were acquired to the organizations developing the algorithms. This movement of data involves time-consuming formalities like complying with HIPAA, GDPR, etc.There is also a risk of patients' private data getting leaked, compromising their confidentiality. One solution to these problems is using the Federated Learning framework. Federated Learning (FL) helps AI models to generalize better and create a robust AI model by using data from different sources having different distributions and data characteristics without moving all the data to a central server. In our paper, we apply the FL framework for training a deep learning model to solve a binary classification problem of predicting the presence or absence of COVID-19. We took three different sources of data and trained individual models on each source. Then we trained an FL model on the complete data and compared all the model performances. We demonstrated that the FL model performs better than the individual models. Moreover, the FL model performed at par with the model trained on all the data combined at a central server. Thus Federated Learning leads to generalized AI models without the cost of data transfer and regulatory overhead.
IVJun 26, 2023
A Fully Unsupervised Instance Segmentation Technique for White Blood Cell ImagesShrijeet Biswas, Amartya Bhattacharya
White blood cells, also known as leukocytes are group of heterogeneously nucleated cells which act as salient immune system cells. These are originated in the bone marrow and are found in blood, plasma, and lymph tissues. Leukocytes kill the bacteria, virus and other kind of pathogens which invade human body through phagocytosis that in turn results immunity. Detection of a white blood cell count can reveal camouflaged infections and warn doctors about chronic medical conditions such as autoimmune diseases, immune deficiencies, and blood disorders. Segmentation plays an important role in identification of white blood cells (WBC) from microscopic image analysis. The goal of segmentation in a microscopic image is to divide the image into different distinct regions. In our paper, we tried to propose a novel instance segmentation method for segmenting the WBCs containing both the nucleus and the cytoplasm, from bone marrow images.