Md Nazmus Sakib

AI
h-index18
7papers
19citations
Novelty31%
AI Score45

7 Papers

CLAug 1, 2024
Risks, Causes, and Mitigations of Widespread Deployments of Large Language Models (LLMs): A Survey

Md Nazmus Sakib, Md Athikul Islam, Royal Pathak et al.

Recent advancements in Large Language Models (LLMs), such as ChatGPT and LLaMA, have significantly transformed Natural Language Processing (NLP) with their outstanding abilities in text generation, summarization, and classification. Nevertheless, their widespread adoption introduces numerous challenges, including issues related to academic integrity, copyright, environmental impacts, and ethical considerations such as data bias, fairness, and privacy. The rapid evolution of LLMs also raises concerns regarding the reliability and generalizability of their evaluations. This paper offers a comprehensive survey of the literature on these subjects, systematically gathered and synthesized from Google Scholar. Our study provides an in-depth analysis of the risks associated with specific LLMs, identifying sub-risks, their causes, and potential solutions. Furthermore, we explore the broader challenges related to LLMs, detailing their causes and proposing mitigation strategies. Through this literature analysis, our survey aims to deepen the understanding of the implications and complexities surrounding these powerful models.

LGAug 1, 2024
Automatic Pull Request Description Generation Using LLMs: A T5 Model Approach

Md Nazmus Sakib, Md Athikul Islam, Md Mashrur Arifin

Developers create pull request (PR) descriptions to provide an overview of their changes and explain the motivations behind them. These descriptions help reviewers and fellow developers quickly understand the updates. Despite their importance, some developers omit these descriptions. To tackle this problem, we propose an automated method for generating PR descriptions based on commit messages and source code comments. This method frames the task as a text summarization problem, for which we utilized the T5 text-to-text transfer model. We fine-tuned a pre-trained T5 model using a dataset containing 33,466 PRs. The model's effectiveness was assessed using ROUGE metrics, which are recognized for their strong alignment with human evaluations. Our findings reveal that the T5 model significantly outperforms LexRank, which served as our baseline for comparison.

HCMar 24
ReflectEd: Evaluating Reflection-Driven Learning in an AI-Assisted System

Md Nazmus Sakib, Ishika Tarin, Naga Manogna Rayasam et al.

In collaborative settings, sustaining momentum and engagement between checkpoints (e.g., meetings) can be challenging, often leading to task drift and reduced preparedness. To address this gap, we developed ReflectEd, an AI-assisted system that supports between-checkpoint reflection through theory-driven prompts with progressively structured levels and mechanism-based scaffolding. We evaluated ReflectEd in a mixed-method study comparing two reflection configurations: a regular reflection workflow and a deeper reflection workflow that included an additional transformative reflection activity. Across conditions, participants reported steady engagement early in the week. In the deeper configuration, later reflections tended to exhibit higher actionability and richer forward-looking planning, while also being harder to sustain and more effortful during periods of active work. Partner-visible reflections were frequently described as supporting coordination by surfacing differences in focus and facilitating accountability. Overall, the findings characterize trade-offs between reflection depth, feasibility, and perceived preparedness for subsequent checkpoints. We discuss implications for the design of AI-assisted systems that support collaboration readiness and reflection-oriented regulation in time-constrained collaborative workflows.

HCMar 22
Expecting Too Much, Getting Too Little: Exploring the Challenges and Design Opportunities of Asynchronous AI Interviewers

Md Nazmus Sakib, Naga Manogna Rayasam, Sanorita Dey

Organizations use asynchronous AI interview systems to efficiently manage large applicant pools, enabling quick and uniform evaluations. However, concerns remain about their impact on user agency and the lack of personalization applicants experience with these systems. Although efforts have been made to humanize the interview process, users' expectations are often unmet, especially when compared to the promises made by these systems. To examine how applicants perceive and experience these tools, particularly in the context of their growing familiarity with large language models (LLMs), we conducted a two-phase study. The first phase involved an analysis of 11 subreddit discussions on interview experiences with asynchronous AI interviewers, followed by a semi-structured interview study with 17 participants. Qualitative analysis revealed key issues such as mismatched expectations, amplified by organizational rhetoric and applicant expectations shaped by experiences with LLMs. These factors shaped participants' sense of agency and trust, often leading to workarounds and deceptive practices. In the follow-up study, we designed an interface with two features, response variants and feedback variants, and evaluated it across six groups (N = 180, 30 participants each) to assess whether these features support users' sense of agency, competence, and relatedness. Our analysis suggests that even subtle design changes can enhance user autonomy and that carefully designed feedback can provide meaningful support in high-stakes interview contexts.

CLJun 16, 2025
An Interdisciplinary Review of Commonsense Reasoning and Intent Detection

Md Nazmus Sakib

This review explores recent advances in commonsense reasoning and intent detection, two key challenges in natural language understanding. We analyze 28 papers from ACL, EMNLP, and CHI (2020-2025), organizing them by methodology and application. Commonsense reasoning is reviewed across zero-shot learning, cultural adaptation, structured evaluation, and interactive contexts. Intent detection is examined through open-set models, generative formulations, clustering, and human-centered systems. By bridging insights from NLP and HCI, we highlight emerging trends toward more adaptive, multilingual, and context-aware models, and identify key gaps in grounding, generalization, and benchmark design.

AINov 17, 2025
Scene Graph-Guided Generative AI Framework for Synthesizing and Evaluating Industrial Hazard Scenarios

Sanjay Acharjee, Abir Khan Ratul, Diego Patino et al.

Training vision models to detect workplace hazards accurately requires realistic images of unsafe conditions that could lead to accidents. However, acquiring such datasets is difficult because capturing accident-triggering scenarios as they occur is nearly impossible. To overcome this limitation, this study presents a novel scene graph-guided generative AI framework that synthesizes photorealistic images of hazardous scenarios grounded in historical Occupational Safety and Health Administration (OSHA) accident reports. OSHA narratives are analyzed using GPT-4o to extract structured hazard reasoning, which is converted into object-level scene graphs capturing spatial and contextual relationships essential for understanding risk. These graphs guide a text-to-image diffusion model to generate compositionally accurate hazard scenes. To evaluate the realism and semantic fidelity of the generated data, a visual question answering (VQA) framework is introduced. Across four state-of-the-art generative models, the proposed VQA Graph Score outperforms CLIP and BLIP metrics based on entropy-based validation, confirming its higher discriminative sensitivity.

AIOct 16, 2025
Sketch2BIM: A Multi-Agent Human-AI Collaborative Pipeline to Convert Hand-Drawn Floor Plans to 3D BIM

Abir Khan Ratul, Sanjay Acharjee, Somin Park et al.

This study introduces a human-in-the-loop pipeline that converts unscaled, hand-drawn floor plan sketches into semantically consistent 3D BIM models. The workflow leverages multimodal large language models (MLLMs) within a multi-agent framework, combining perceptual extraction, human feedback, schema validation, and automated BIM scripting. Initially, sketches are iteratively refined into a structured JSON layout of walls, doors, and windows. Later, these layouts are transformed into executable scripts that generate 3D BIM models. Experiments on ten diverse floor plans demonstrate strong convergence: openings (doors, windows) are captured with high reliability in the initial pass, while wall detection begins around 83% and achieves near-perfect alignment after a few feedback iterations. Across all categories, precision, recall, and F1 scores remain above 0.83, and geometric errors (RMSE, MAE) progressively decrease to zero through feedback corrections. This study demonstrates how MLLM-driven multi-agent reasoning can make BIM creation accessible to both experts and non-experts using only freehand sketches.