Md Jahangir Alam

LG
h-index12
19papers
48citations
Novelty40%
AI Score50

19 Papers

IVJun 10, 2023
Online learning for X-ray, CT or MRI

Mosabbir Bhuiyan, MD Abdullah Al Nasim, Sarwar Saif et al.

Medical imaging plays an important role in the medical sector in identifying diseases. X-ray, computed tomography (CT) scans, and magnetic resonance imaging (MRI) are a few examples of medical imaging. Most of the time, these imaging techniques are utilized to examine and diagnose diseases. Medical professionals identify the problem after analyzing the images. However, manual identification can be challenging because the human eye is not always able to recognize complex patterns in an image. Because of this, it is difficult for any professional to recognize a disease with rapidity and accuracy. In recent years, medical professionals have started adopting Computer-Aided Diagnosis (CAD) systems to evaluate medical images. This system can analyze the image and detect the disease very precisely and quickly. However, this system has certain drawbacks in that it needs to be processed before analysis. Medical research is already entered a new era of research which is called Artificial Intelligence (AI). AI can automatically find complex patterns from an image and identify diseases. Methods for medical imaging that uses AI techniques will be covered in this chapter.

AIJun 7, 2023
AutoML Systems For Medical Imaging

Tasmia Tahmida Jidney, Angona Biswas, MD Abdullah Al Nasim et al.

The integration of machine learning in medical image analysis can greatly enhance the quality of healthcare provided by physicians. The combination of human expertise and computerized systems can result in improved diagnostic accuracy. An automated machine learning approach simplifies the creation of custom image recognition models by utilizing neural architecture search and transfer learning techniques. Medical imaging techniques are used to non-invasively create images of internal organs and body parts for diagnostic and procedural purposes. This article aims to highlight the potential applications, strategies, and techniques of AutoML in medical imaging through theoretical and empirical evidence.

IVJun 1, 2023
Introduction to Medical Imaging Informatics

Md. Zihad Bin Jahangir, Ruksat Hossain, Riadul Islam et al.

Medical imaging informatics is a rapidly growing field that combines the principles of medical imaging and informatics to improve the acquisition, management, and interpretation of medical images. This chapter introduces the basic concepts of medical imaging informatics, including image processing, feature engineering, and machine learning. It also discusses the recent advancements in computer vision and deep learning technologies and how they are used to develop new quantitative image markers and prediction models for disease detection, diagnosis, and prognosis prediction. By covering the basic knowledge of medical imaging informatics, this chapter provides a foundation for understanding the role of informatics in medicine and its potential impact on patient care.

DCJul 25, 2024
SCALE: Self-regulated Clustered federAted LEarning in a Homogeneous Environment

Sai Puppala, Ismail Hossain, Md Jahangir Alam et al.

Federated Learning (FL) has emerged as a transformative approach for enabling distributed machine learning while preserving user privacy, yet it faces challenges like communication inefficiencies and reliance on centralized infrastructures, leading to increased latency and costs. This paper presents a novel FL methodology that overcomes these limitations by eliminating the dependency on edge servers, employing a server-assisted Proximity Evaluation for dynamic cluster formation based on data similarity, performance indices, and geographical proximity. Our integrated approach enhances operational efficiency and scalability through a Hybrid Decentralized Aggregation Protocol, which merges local model training with peer-to-peer weight exchange and a centralized final aggregation managed by a dynamically elected driver node, significantly curtailing global communication overhead. Additionally, the methodology includes Decentralized Driver Selection, Check-pointing to reduce network traffic, and a Health Status Verification Mechanism for system robustness. Validated using the breast cancer dataset, our architecture not only demonstrates a nearly tenfold reduction in communication overhead but also shows remarkable improvements in reducing training latency and energy consumption while maintaining high learning performance, offering a scalable, efficient, and privacy-preserving solution for the future of federated learning ecosystems.

LGJul 25, 2024
Generative AI like ChatGPT in Blockchain Federated Learning: use cases, opportunities and future

Sai Puppala, Ismail Hossain, Md Jahangir Alam et al.

Federated learning has become a significant approach for training machine learning models using decentralized data without necessitating the sharing of this data. Recently, the incorporation of generative artificial intelligence (AI) methods has provided new possibilities for improving privacy, augmenting data, and customizing models. This research explores potential integrations of generative AI in federated learning, revealing various opportunities to enhance privacy, data efficiency, and model performance. It particularly emphasizes the importance of generative models like generative adversarial networks (GANs) and variational autoencoders (VAEs) in creating synthetic data that replicates the distribution of real data. Generating synthetic data helps federated learning address challenges related to limited data availability and supports robust model development. Additionally, we examine various applications of generative AI in federated learning that enable more personalized solutions.

LGAug 6, 2024
FLASH: Federated Learning-Based LLMs for Advanced Query Processing in Social Networks through RAG

Sai Puppala, Ismail Hossain, Md Jahangir Alam et al.

Our paper introduces a novel approach to social network information retrieval and user engagement through a personalized chatbot system empowered by Federated Learning GPT. The system is designed to seamlessly aggregate and curate diverse social media data sources, including user posts, multimedia content, and trending news. Leveraging Federated Learning techniques, the GPT model is trained on decentralized data sources to ensure privacy and security while providing personalized insights and recommendations. Users interact with the chatbot through an intuitive interface, accessing tailored information and real-time updates on social media trends and user-generated content. The system's innovative architecture enables efficient processing of input files, parsing and enriching text data with metadata, and generating relevant questions and answers using advanced language models. By facilitating interactive access to a wealth of social network information, this personalized chatbot system represents a significant advancement in social media communication and knowledge dissemination.

69.4CRApr 8
Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines

Tanzim Ahad, Ismail Hossain, Md Jahangir Alam et al.

We introduce Semantic Intent Fragmentation (SIF), an attack class against LLM orchestration systems where a single, legitimately phrased request causes an orchestrator to decompose a task into subtasks that are individually benign but jointly violate security policy. Current safety mechanisms operate at the subtask level, so each step clears existing classifiers -- the violation only emerges at the composed plan. SIF exploits OWASP LLM06:2025 through four mechanisms: bulk scope escalation, silent data exfiltration, embedded trigger deployment, and quasi-identifier aggregation, requiring no injected content, no system modification, and no attacker interaction after the initial request. We construct a three-stage red-teaming pipeline grounded in OWASP, MITRE ATLAS, and NIST frameworks to generate realistic enterprise scenarios. Across 14 scenarios spanning financial reporting, information security, and HR analytics, a GPT-20B orchestrator produces policy-violating plans in 71% of cases (10/14) while every subtask appears benign. Three independent signals validate this: deterministic taint analysis, chain-of-thought evaluation, and a cross-model compliance judge with 0% false positives. Stronger orchestrators increase SIF success rates. Plan-level information-flow tracking combined with compliance evaluation detects all attacks before execution, showing the compositional safety gap is closable.

IRJul 13, 2024
SocialRec: User Activity Based Post Weighted Dynamic Personalized Post Recommendation System in Social Media

Ismail Hossain, Sai Puppala, Md Jahangir Alam et al.

User activities can influence their subsequent interactions with a post, generating interest in the user. Typically, users interact with posts from friends by commenting and using reaction emojis, reflecting their level of interest on social media such as Facebook, Twitter, and Reddit. Our objective is to analyze user history over time, including their posts and engagement on various topics. Additionally, we take into account the user's profile, seeking connections between their activities and social media platforms. By integrating user history, engagement, and persona, we aim to assess recommendation scores based on relevant item sharing by Hit Rate (HR) and the quality of the ranking system by Normalized Discounted Cumulative Gain (NDCG), where we achieve the highest for NeuMF 0.80 and 0.6 respectively. Our hybrid approach solves the cold-start problem when there is a new user, for new items cold-start problem will never occur, as we consider the post category values. To improve the performance of the model during cold-start we introduce collaborative filtering by looking for similar users and ranking the users based on the highest similarity scores.

LGAug 7, 2024
SocFedGPT: Federated GPT-based Adaptive Content Filtering System Leveraging User Interactions in Social Networks

Sai Puppala, Ismail Hossain, Md Jahangir Alam et al.

Our study presents a multifaceted approach to enhancing user interaction and content relevance in social media platforms through a federated learning framework. We introduce personalized GPT and Context-based Social Media LLM models, utilizing federated learning for privacy and security. Four client entities receive a base GPT-2 model and locally collected social media data, with federated aggregation ensuring up-to-date model maintenance. Subsequent modules focus on categorizing user posts, computing user persona scores, and identifying relevant posts from friends' lists. A quantifying social engagement approach, coupled with matrix factorization techniques, facilitates personalized content suggestions in real-time. An adaptive feedback loop and readability score algorithm also enhance the quality and relevance of content presented to users. Our system offers a comprehensive solution to content filtering and recommendation, fostering a tailored and engaging social media experience while safeguarding user privacy.

SIJul 12, 2024
EVOLVE: Predicting User Evolution and Network Dynamics in Social Media Using Fine-Tuned GPT-like Model

Ismail Hossain, Md Jahangir Alam, Sai Puppala et al.

Social media platforms are extensively used for sharing personal emotions, daily activities, and various life events, keeping people updated with the latest happenings. From the moment a user creates an account, they continually expand their network of friends or followers, freely interacting with others by posting, commenting, and sharing content. Over time, user behavior evolves based on demographic attributes and the networks they establish. In this research, we propose a predictive method to understand how a user evolves on social media throughout their life and to forecast the next stage of their evolution. We fine-tune a GPT-like decoder-only model (we named it E-GPT: Evolution-GPT) to predict the future stages of a user's evolution in online social media. We evaluate the performance of these models and demonstrate how user attributes influence changes within their network by predicting future connections and shifts in user activities on social media, which also addresses other social media challenges such as recommendation systems.

LGNov 12, 2025
LLM-Guided Dynamic-UMAP for Personalized Federated Graph Learning

Sai Puppala, Ismail Hossain, Md Jahangir Alam et al.

We propose a method that uses large language models to assist graph machine learning under personalization and privacy constraints. The approach combines data augmentation for sparse graphs, prompt and instruction tuning to adapt foundation models to graph tasks, and in-context learning to supply few-shot graph reasoning signals. These signals parameterize a Dynamic UMAP manifold of client-specific graph embeddings inside a Bayesian variational objective for personalized federated learning. The method supports node classification and link prediction in low-resource settings and aligns language model latent representations with graph structure via a cross-modal regularizer. We outline a convergence argument for the variational aggregation procedure, describe a differential privacy threat model based on a moments accountant, and present applications to knowledge graph completion, recommendation-style link prediction, and citation and product graphs. We also discuss evaluation considerations for benchmarking LLM-assisted graph machine learning.

70.5CRMay 12
The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems

Tanzim Ahad, Ismail Hossain, Md Jahangir Alam et al.

Multi-agent AI pipelines typically assume that agent misconduct originates from model misalignment. We identify a structural failure in this assumption, the \emph{Misattribution Gap}, where memory-layer attacks produce behaviors indistinguishable from model failure, causing defenders to apply the wrong remediation. We formalize \emph{Semantic Norm Drift} (SND) as a third path to agent misconduct, distinct from emergent misalignment and collusion. In SND, a policy-formatted document enters a shared vector store through normal uploads and later reappears as trusted system context after provenance is lost through a Trust Laundering Chain. Across 64 documented failures, attribution systems consistently blamed the model. Four safety classifiers, including one trained on memory poisoning, produced zero detections across 510 checkpoints. In 59 of 65 valid cases, agents explicitly cited the injected document as normative authority before complying. The attack requires no trigger, model access, or repeated interaction, achieves full effect within five sessions, and persists indefinitely. We introduce Counterfactual Composition Testing, which identifies the causal entry with 87.5% accuracy and zero false positives, while a forensics baseline fails across all 25 scenarios. We further prove the Retrieval-Coverage Dilemma, showing that stronger evasion inherently weakens the attack, limiting adaptive bypass strategies. Finally, we propose Memory-Persistent Information-Flow Control, which blocks 97% of attacks at the cross-session boundary where prior defenses fail. We release the SND Corpus, the first adversarial memory benchmark with temporal persistence and multi-agent composition across financial and Health Care domains.

SIJul 21, 2025Code
EVOLVE-X: Embedding Fusion and Language Prompting for User Evolution Forecasting on Social Media

Ismail Hossain, Sai Puppala, Md Jahangir Alam et al.

Social media platforms serve as a significant medium for sharing personal emotions, daily activities, and various life events, ensuring individuals stay informed about the latest developments. From the initiation of an account, users progressively expand their circle of friends or followers, engaging actively by posting, commenting, and sharing content. Over time, user behavior on these platforms evolves, influenced by demographic attributes and the networks they form. In this study, we present a novel approach that leverages open-source models Llama-3-Instruct, Mistral-7B-Instruct, Gemma-7B-IT through prompt engineering, combined with GPT-2, BERT, and RoBERTa using a joint embedding technique, to analyze and predict the evolution of user behavior on social media over their lifetime. Our experiments demonstrate the potential of these models to forecast future stages of a user's social evolution, including network changes, future connections, and shifts in user activities. Experimental results highlight the effectiveness of our approach, with GPT-2 achieving the lowest perplexity (8.21) in a Cross-modal configuration, outperforming RoBERTa (9.11) and BERT, and underscoring the importance of leveraging Cross-modal configurations for superior performance. This approach addresses critical challenges in social media, such as friend recommendations and activity predictions, offering insights into the trajectory of user behavior. By anticipating future interactions and activities, this research aims to provide early warnings about potential negative outcomes, enabling users to make informed decisions and mitigate risks in the long term.

64.9CRMay 9
The Art of the Jailbreak: Formulating Jailbreak Attacks for LLM Security Beyond Binary Scoring

Ismail Hossain, Tanzim Ahad, Md Jahangir Alam et al.

Jailbreak attacks -- adversarial prompts that bypass LLM alignment through purely linguistic manipulation -- pose a growing operational security threat, yet the field lacks large-scale, reproducible infrastructure for generating, categorizing, and evaluating them systematically. This paper addresses that gap with three contributions. (1) Large-scale compositional jailbreak dataset. We construct 114,000 adversarial prompts by applying 912 composing strategies to 125 harmful seed prompts from JailBreakV-28K. Every prompt is assigned to one of 14 cybersecurity attack categories (e.g., malware, phishing, privilege escalation) via a six-model majority-vote pipeline, and each strategy is ranked by effectiveness per category, enabling principled strategy selection grounded in concrete adversarial objectives. (2) Automated jailbreak generation. We instruction-fine-tune category-aware LLMs on Moderate and Optimal subsets, producing models that synthesize fluent jailbreak prompts from a harmful seed at inference time -- no templates, no gradient search. Our generators achieve perplexity 24-39 versus 40-140 for AutoDAN and AmpleGCG, with safety-filter evasion rates of 0.29-0.51 Mal (LlamaPromptGuard-2-86M), enabling controllable, scalable red-teaming under realistic adversarial conditions. (3) OPTIMUS: a training-free jailbreak evaluator. OPTIMUS is a continuous metric J(S,H) that jointly captures semantic similarity between the harmful seed and the jailbreak (S) and harmfulness probability (H) via calibrated penalty functions. Unlike binary attack success rate (ASR), OPTIMUS requires no task-specific training, generalizes across evolving strategies, and exposes a stealth-optimal regime (S*=0.57, H*=0.43) that ASR misses. Experiments across 114,000 prompts confirm that OPTIMUS separates Weak, Moderate, and Optimal jailbreaks with category-level evidence binary evaluation cannot supply.

66.5LGApr 8
When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models

Ismail Hossain, Sai Puppala, Jannatul Ferdaus et al.

A guard model fine-tuned on entirely benign data can lose all safety alignment -- not through adversarial manipulation, but through standard domain specialization. We demonstrate this failure across three purpose-built safety classifiers -- LlamaGuard, WildGuard, and Granite Guardian -- deployed as protection layers in agentic AI pipelines, and show that it originates in the destruction of latent safety geometry: the structured harmful -- benign representational boundary that guides classification. We extract per-layer safety subspaces via SVD on class-conditional activation differences and track how this boundary evolves under benign fine-tuning. Granite Guardian undergoes complete collapse -- refusal rate drops from 85\% to 0\%, CKA falls to zero, and 100\% of outputs become ambiguous -- a severity exceeding prior findings on general-purpose LLMs, explained by the specialization hypothesis: concentrated safety representations are efficient but catastrophically brittle. To mitigate this, we propose Fisher-Weighted Safety Subspace Regularization (FW-SSR), a training-time penalty combining (i) curvature-aware direction weights derived from diagonal Fisher information and (ii) an adaptive $λ_t$ that scales with task-safety gradient conflict. FW-SSR recovers 75\% refusal on Granite Guardian (CKA = 0.983) and reduces WildGuard's Attack Success Rate to 3.6\% -- below the unmodified baseline -- by actively sharpening the safety subspace rather than merely anchoring it. Across all three models, structural representational geometry (CKA, Fisher score) predicts safety behavior more reliably than absolute displacement metrics, establishing geometry-based monitoring as a necessary component of guard model evaluation in agentic deployments.

CRMay 24, 2025
LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis

Md Ahsanul Haque, Ismail Hossain, Md Mahmuduzzaman Kamol et al.

Machine learning (ML)-based malware detection systems often fail to account for the dynamic nature of real-world training and test data distributions. In practice, these distributions evolve due to frequent changes in the Android ecosystem, adversarial development of new malware families, and the continuous emergence of both benign and malicious applications. Prior studies have shown that such concept drift -- distributional shifts in benign and malicious samples, leads to significant degradation in detection performance over time. Despite the practical importance of this issue, existing datasets are often outdated and limited in temporal scope, diversity of malware families, and sample scale, making them insufficient for the systematic evaluation of concept drift in malware detection. To address this gap, we present LAMDA, the largest and most temporally diverse Android malware benchmark to date, designed specifically for concept drift analysis. LAMDA spans 12 years (2013-2025, excluding 2015), includes over 1 million samples (approximately 37% labeled as malware), and covers 1,380 malware families and 150,000 singleton samples, reflecting the natural distribution and evolution of real-world Android applications. We empirically demonstrate LAMDA's utility by quantifying the performance degradation of standard ML models over time and analyzing feature stability across years. As the most comprehensive Android malware dataset to date, LAMDA enables in-depth research into temporal drift, generalization, explainability, and evolving detection challenges. The dataset and code are available at: https://iqsec-lab.github.io/LAMDA/.

LGNov 23, 2025
Real-Time Personalized Content Adaptation through Matrix Factorization and Context-Aware Federated Learning

Sai Puppala, Ismail Hossain, Md Jahangir Alam et al.

Our study presents a multifaceted approach to enhancing user interaction and content relevance in social media platforms through a federated learning framework. We introduce personalized LLM Federated Learning and Context-based Social Media models. In our framework, multiple client entities receive a foundational GPT model, which is fine-tuned using locally collected social media data while ensuring data privacy through federated aggregation. Key modules focus on categorizing user-generated content, computing user persona scores, and identifying relevant posts from friends networks. By integrating a sophisticated social engagement quantification method with matrix factorization techniques, our system delivers real-time personalized content suggestions tailored to individual preferences. Furthermore, an adaptive feedback loop, alongside a robust readability scoring algorithm, significantly enhances the quality and relevance of the content presented to users. This comprehensive solution not only addresses the challenges of content filtering and recommendation but also fosters a more engaging social media experience while safeguarding user privacy, setting a new standard for personalized interactions in digital platforms.

LGSep 4, 2025
Variational Gaussian Mixture Manifold Models for Client-Specific Federated Personalization

Sai Puppala, Ismail Hossain, Md Jahangir Alam et al.

Personalized federated learning (PFL) often fails under label skew and non-stationarity because a single global parameterization ignores client-specific geometry. We introduce VGM$^2$ (Variational Gaussian Mixture Manifold), a geometry-centric PFL framework that (i) learns client-specific parametric UMAP embeddings, (ii) models latent pairwise distances with mixture relation markers for same and different class pairs, and (iii) exchanges only variational, uncertainty-aware marker statistics. Each client maintains a Dirichlet-Normal-Inverse-Gamma (Dir-NIG) posterior over marker weights, means, and variances; the server aggregates via conjugate moment matching to form global priors that guide subsequent rounds. We prove that this aggregation minimizes the summed reverse Kullback-Leibler divergence from client posteriors within the conjugate family, yielding stability under heterogeneity. We further incorporate a calibration term for distance-to-similarity mapping and report communication and compute budgets. Across eight vision datasets with non-IID label shards, VGM$^2$ achieves competitive or superior test F1 scores compared to strong baselines while communicating only small geometry summaries. Privacy is strengthened through secure aggregation and optional differential privacy noise, and we provide a membership-inference stress test. Code and configurations will be released to ensure full reproducibility.

CRSep 4, 2025
AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning

Ismail Hossain, Sai Puppala, Sajedul Talukder et al.

Scams exploiting real-time social engineering -- such as phishing, impersonation, and phone fraud -- remain a persistent and evolving threat across digital platforms. Existing defenses are largely reactive, offering limited protection during active interactions. We propose a privacy-preserving, AI-in-the-loop framework that proactively detects and disrupts scam conversations in real time. The system combines instruction-tuned artificial intelligence with a safety-aware utility function that balances engagement with harm minimization, and employs federated learning to enable continual model updates without raw data sharing. Experimental evaluations show that the system produces fluent and engaging responses (perplexity as low as 22.3, engagement $\approx$0.80), while human studies confirm significant gains in realism, safety, and effectiveness over strong baselines. In federated settings, models trained with FedAvg sustain up to 30 rounds while preserving high engagement ($\approx$0.80), strong relevance ($\approx$0.74), and low PII leakage ($\leq$0.0085). Even with differential privacy, novelty and safety remain stable, indicating that robust privacy can be achieved without sacrificing performance. The evaluation of guard models (LlamaGuard, LlamaGuard2/3, MD-Judge) shows a straightforward pattern: stricter moderation settings reduce the chance of exposing personal information, but they also limit how much the model engages in conversation. In contrast, more relaxed settings allow longer and richer interactions, which improve scam detection, but at the cost of higher privacy risk. To our knowledge, this is the first framework to unify real-time scam-baiting, federated privacy preservation, and calibrated safety moderation into a proactive defense paradigm.