Mohd Ruhul Ameen

CV
h-index24
17papers
7citations
Novelty48%
AI Score49

17 Papers

CLMay 6
Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks

Mohd Ruhul Ameen, Akif Islam, Nadim Mahmud et al.

Statistical watermarking is a common approach for verifying whether text was written by a language model. Most existing schemes assume autoregressive generation, where tokens are produced left to right and contextual hashing is well defined. Diffusion language models generate text by denoising tokens in arbitrary order, so these schemes cannot be applied directly. A recent watermark by Gloaguen et al. addresses this gap for LLaDA 8B Instruct and reports true positive detection above 99%. This paper studies what happens when watermarked text is rewritten not once but several times. Using the same watermark configuration, 1,605 watermarked completions of about 300 tokens each are produced across five WaterBench domains. Each completion is rewritten by four open weight language models, from 1.5B to 8B parameters, none of which know the watermark key. Five rewrite styles are tested: paraphrase, humanize, simplify, academic, and summarize expand. Each style is chained for up to five hops, producing 160,500 rewritten texts in total. The watermark is detected on 87.9% of the original outputs at the standard significance threshold. After a single rewrite, detection falls to between 14% and 41% depending on the rewriter and style. After five chained rewrites, detection falls to 4.86%, meaning 94.76% of the originally detected texts are no longer flagged. After three rewrites, the detector score has dropped 86% of the way from its watermarked baseline toward the null distribution. Repeated rewriting is therefore a much stronger attack than a single rewrite, and the result holds across all four rewriters tested.

CRMar 19
Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Md Takrim Ul Alam, Akif Islam, Mohd Ruhul Ameen et al.

Large language models (LLMs) deployed behind APIs and retrieval-augmented generation (RAG) stacks are vulnerable to prompt injection attacks that may override system policies, subvert intended behavior, and induce unsafe outputs. Existing defenses often treat prompts as flat strings and rely on ad hoc filtering or static jailbreak detection. This paper proposes Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models each request as a structured composition of system, developer, user, and retrieved-document segments. PCFI applies a three-stage middleware pipeline, lexical heuristics, role-switch detection, and hierarchical policy enforcement, before forwarding requests to the backend LLM. We implement PCFI as a FastAPI-based gateway for deployed LLM APIs and evaluate it on a custom benchmark of synthetic and semi-realistic prompt-injection workloads. On the evaluated benchmark suite, PCFI intercepts all attack-labeled requests, maintains a 0% False Positive Rate, and introduces a median processing overhead of only 0.04 ms. These results suggest that provenance- and priority-aware prompt enforcement is a practical and lightweight defense for deployed LLM systems.

HCMar 18
Critical Thinking in the Age of Artificial Intelligence: A Survey-Based Study with Machine Learning Insights

M Murshidul Bari, Akif Islam, Mohd Ruhul Ameen et al.

The growing use of artificial intelligence (AI) in education, professional work, and everyday problem-solving has raised important questions about its effect on human reasoning. While AI can improve efficiency, save time, and support learning, repeated dependence on it may also encourage cognitive offloading, reduce productive struggle, and weaken independent critical thinking. This paper investigates the relationship between AI-use behavior and critical-thinking performance through an interview-based survey combined with short logic and reasoning tasks. The findings reveal a mixed pattern: participants largely viewed AI as a tool for speed, convenience, and learning support, yet many also reported reduced patience for sustained effort. Objective reasoning performance varied considerably across individuals, and the analyses suggest that reduced patience and stronger dependence-related tendencies are more closely associated with lower reasoning performance than background characteristics alone. Exploratory clustering further indicates that AI users do not form a single homogeneous group, but instead reflect tentative behavioral profiles, including over-reliant users, mixed-strategy users, and balanced support-seekers. Although the findings are exploratory, they indicate that AI does not affect critical thinking in a uniformly negative or positive way. Instead, its influence appears to depend on the manner in which it is used. The paper therefore argues that effective human-AI collaboration should support reflection, verification, and sustained cognitive effort rather than substitute for them.

CVNov 1, 2025
Oitijjo-3D: Generative AI Framework for Rapid 3D Heritage Reconstruction from Street View Imagery

Momen Khandoker Ope, Akif Islam, Mohd Ruhul Ameen et al.

Cultural heritage restoration in Bangladesh faces a dual challenge of limited resources and scarce technical expertise. Traditional 3D digitization methods, such as photogrammetry or LiDAR scanning, require expensive hardware, expert operators, and extensive on-site access, which are often infeasible in developing contexts. As a result, many of Bangladesh's architectural treasures, from the Paharpur Buddhist Monastery to Ahsan Manzil, remain vulnerable to decay and inaccessible in digital form. This paper introduces Oitijjo-3D, a cost-free generative AI framework that democratizes 3D cultural preservation. By using publicly available Google Street View imagery, Oitijjo-3D reconstructs faithful 3D models of heritage structures through a two-stage pipeline - multimodal visual reasoning with Gemini 2.5 Flash Image for structure-texture synthesis, and neural image-to-3D generation through Hexagen for geometry recovery. The system produces photorealistic, metrically coherent reconstructions in seconds, achieving significant speedups compared to conventional Structure-from-Motion pipelines, without requiring any specialized hardware or expert supervision. Experiments on landmarks such as Ahsan Manzil, Choto Sona Mosque, and Paharpur demonstrate that Oitijjo-3D preserves both visual and structural fidelity while drastically lowering economic and technical barriers. By turning open imagery into digital heritage, this work reframes preservation as a community-driven, AI-assisted act of cultural continuity for resource-limited nations.

HCMar 19
Beyond Ray-Casting: Evaluating Controller, Free-Hand, and Virtual-Touch Modalities for Immersive Text Entry

Md. Tanvir Hossain, Mohd Ruhul Ameen, Akif Islam et al.

Efficient text entry remains a primary bottleneck preventing Virtual Reality (VR) from evolving into a viable productivity platform. To address this, we conducted an empirical comparison of six physical input systems across three interaction styles Controller Driven, Free Hand, and Virtual Touch evaluating both discrete tap typing and continuous gesture typing (swiping), alongside a speech to text (Voice) condition as a non physical reference modality. Results from 21 participants show that the Controller Driven Tap Gesture Combo (CD TGC) delivers the best productivity performance, achieving speeds 2.25 times higher than the slowest system and 30% faster than the current industry standard, while reducing error rates by up to 68%. A clear trade off emerged between performance and perceived usability: although controller based gesture input led on speed and accuracy, participants rated Virtual Touch Tap Typing highest in subjective experience, scoring 80% higher on the System Usability Scale (SUS) than the lowest rated alternative. We further observe that Free Hand interaction remains limited by tracking stability and physical fatigue, whereas Voice input introduces practical constraints related to privacy, editing control, and immersive engagement. Together, these findings characterize the tension between throughput and natural interaction in immersive text entry and provide data driven guidance for future VR interface design.

CRMay 3
QASecClaw: A Multi-Agent LLM Approach for False Positive Reduction in Static Application Security Testing

Mohd Ruhul Ameen, Md Takrim Ul Alam, Akif Islam

Static Application Security Testing tools help developers find security vulnerabilities before release, but they often produce many false positives. This increases manual review effort, reduces developer trust, and may cause real vulnerabilities to be ignored among noisy reports. We present QASecClaw, a multi agent approach that combines conventional Static Application Security Testing with coding specialized Large Language Model based contextual code review. A SAST engine first reports candidate vulnerabilities, and a Large Language Model based SAST Filter Agent then reviews each finding with source code context to decide whether it is likely to be a true positive or a false positive. QASecClaw is coordinated by a Mission Orchestrator and includes specialized agents for test planning, security validation, evidence correlation, filtering, and reporting. We evaluate QASecClaw on OWASP Benchmark v1.2, which contains 2,740 Java test cases across 11 Common Weakness Enumeration categories with ground truth labels. QASecClaw achieves an F1 score of 90.93 percent, compared with 78.39 percent for standalone Semgrep. The improvement is mainly driven by an 88.6 percent reduction in false positives, from 560 to 64, with only a 3.1 percent reduction in recall. These results show that Large Language Model augmented multi agent verification can make Static Application Security Testing output more accurate, useful, and trustworthy.

LGNov 1, 2025
Balancing Interpretability and Performance in Motor Imagery EEG Classification: A Comparative Study of ANFIS-FBCSP-PSO and EEGNet

Farjana Aktar, Mohd Ruhul Ameen, Akif Islam et al.

Achieving both accurate and interpretable classification of motor imagery EEG remains a key challenge in brain computer interface (BCI) research. This paper compares a transparent fuzzy reasoning approach (ANFIS-FBCSP-PSO) with a deep learning benchmark (EEGNet) using the BCI Competition IV-2a dataset. The ANFIS pipeline combines filter bank common spatial pattern feature extraction with fuzzy IF-THEN rules optimized via particle swarm optimization, while EEGNet learns hierarchical spatial temporal representations directly from raw EEG data. In within-subject experiments, the fuzzy neural model performed better (68.58 percent +/- 13.76 percent accuracy, kappa = 58.04 percent +/- 18.43), while in cross-subject (LOSO) tests, the deep model exhibited stronger generalization (68.20 percent +/- 12.13 percent accuracy, kappa = 57.33 percent +/- 16.22). The study provides practical guidance for selecting MI-BCI systems according to design goals: interpretability or robustness across users. Future investigations into transformer based and hybrid neuro symbolic frameworks are expected to advance transparent EEG decoding.

HCMar 9
Khelte Khelte Shikhi: A Proposed HCI Framework for Gamified Interactive Learning with Minecraft in Bangladeshi Education Systems

Mohd Ruhul Ameen, Akif Islam, Momen Khandokar Ope

Game-based learning shows real promise for engaging students in well-resourced classrooms, but what about the millions who study in schools with far fewer opportunities? We propose a practical framework for bringing Minecraft Education Edition into Bangladesh's 130,000 schools, where 55 percent lack reliable internet, rural areas receive only 12 to 16 hours of electricity per day, computer access in rural schools is just 8 percent, and student-teacher ratios can reach 52:1. Our approach addresses these constraints directly through three deployment tiers: cloud-based multiplayer for urban schools with stable infrastructure (15 percent), local-area network (LAN) solutions with optional solar power for semi-urban schools (30 percent), and fully offline, turn-based modes using refurbished hardware for rural contexts (55 percent). We provide eight curriculum-aligned Minecraft worlds with complete Bangla localization, covering topics from Lalbagh Fort reconstruction to monsoon flood simulation. The interface supports first-time users through progressive complexity, culturally familiar metaphors rooted in local farming and architecture, and accessibility features such as keyboard-only controls and 200 percent text scaling. We outline evaluation benchmarks including 15 percent learning gains, 70 percent transfer-task mastery, System Usability Scale scores above 70, and a per-student cost below two dollars per hour. Although not yet empirically validated, this work synthesizes game-based learning theory, HCI principles, and contextual analysis to offer implementable specifications for pilot testing in resource-constrained settings. It is presented as a design-oriented conceptual framework rather than a field deployment, providing an implementation-ready blueprint for future empirical validation.

HCNov 8, 2025
Towards a Humanized Social-Media Ecosystem: AI-Augmented HCI Design Patterns for Safety, Agency & Well-Being

Mohd Ruhul Ameen, Akif Islam

Social platforms connect billions of people, yet their engagement-first algorithms often work on users rather than with them, amplifying stress, misinformation, and a loss of control. We propose Human-Layer AI (HL-AI)--user-owned, explainable intermediaries that sit in the browser between platform logic and the interface. HL-AI gives people practical, moment-to-moment control without requiring platform cooperation. We contribute a working Chrome/Edge prototype implementing five representative pattern frameworks--Context-Aware Post Rewriter, Post Integrity Meter, Granular Feed Curator, Micro-Withdrawal Agent, and Recovery Mode--alongside a unifying mathematical formulation balancing user utility, autonomy costs, and risk thresholds. Evaluation spans technical accuracy, usability, and behavioral outcomes. The result is a suite of humane controls that help users rewrite before harm, read with integrity cues, tune feeds with intention, pause compulsive loops, and seek shelter during harassment, all while preserving agency through explanations and override options. This prototype offers a practical path to retrofit today's feeds with safety, agency, and well-being, inviting rigorous cross-cultural user evaluation.

CVNov 1, 2025
Detecting AI-Generated Images via Diffusion Snap-Back Reconstruction: A Forensic Approach

Mohd Ruhul Ameen, Akif Islam

The rapid rise of generative diffusion models has made distinguishing authentic visual content from synthetic imagery increasingly challenging. Traditional deepfake detection methods, which rely on frequency or pixel-level artifacts, fail against modern text-to-image systems such as Stable Diffusion and DALL-E that produce photorealistic and artifact-free results. This paper introduces a diffusion-based forensic framework that leverages multi-strength image reconstruction dynamics, termed diffusion snap-back, to identify AI-generated images. By analysing how reconstruction metrics (LPIPS, SSIM, and PSNR) evolve across varying noise strengths, we extract interpretable manifold-based features that differentiate real and synthetic images. Evaluated on a balanced dataset of 4,000 images, our approach achieves 0.993 AUROC under cross-validation and remains robust to common distortions such as compression and noise. Despite using limited data and a single diffusion backbone (Stable Diffusion v1.5), the proposed method demonstrates strong generalization and interpretability, offering a foundation for scalable, model-agnostic synthetic media forensics.

CVOct 27, 2025
CountFormer: A Transformer Framework for Learning Visual Repetition and Structure in Class-Agnostic Object Counting

Md Tanvir Hossain, Akif Islam, Mohd Ruhul Ameen

Humans can effortlessly count diverse objects by perceiving visual repetition and structural relationships rather than relying on class identity. However, most existing counting models fail to replicate this ability; they often miscount when objects exhibit complex shapes, internal symmetry, or overlapping components. In this work, we introduce CountFormer, a transformer-based framework that learns to recognize repetition and structural coherence for class-agnostic object counting. Built upon the CounTR architecture, our model replaces its visual encoder with the self-supervised foundation model DINOv2, which produces richer and spatially consistent feature representations. We further incorporate positional embedding fusion to preserve geometric relationships before decoding these features into density maps through a lightweight convolutional decoder. Evaluated on the FSC-147 dataset, our model achieves performance comparable to current state-of-the-art methods while demonstrating superior accuracy on structurally intricate or densely packed scenes. Our findings indicate that integrating foundation models such as DINOv2 enables counting systems to approach human-like structural perception, advancing toward a truly general and exemplar-free counting paradigm.

CVOct 21, 2025
Automated Wicket-Taking Delivery Segmentation and Weakness Detection in Cricket Videos Using OCR-Guided YOLOv8 and Trajectory Modeling

Mst Jannatun Ferdous, Masum Billah, Joy Karmoker et al.

This paper presents an automated system for cricket video analysis that leverages deep learning techniques to extract wicket-taking deliveries, detect cricket balls, and model ball trajectories. The system employs the YOLOv8 architecture for pitch and ball detection, combined with optical character recognition (OCR) for scorecard extraction to identify wicket-taking moments. Through comprehensive image preprocessing, including grayscale transformation, power transformation, and morphological operations, the system achieves robust text extraction from video frames. The pitch detection model achieved 99.5% mean Average Precision at 50% IoU (mAP50) with a precision of 0.999, while the ball detection model using transfer learning attained 99.18% mAP50 with 0.968 precision and 0.978 recall. The system enables trajectory modeling on detected pitches, providing data-driven insights for identifying batting weaknesses. Experimental results on multiple cricket match videos demonstrate the effectiveness of this approach for automated cricket analytics, offering significant potential for coaching and strategic decision-making.

CLOct 21, 2025
KrishokBondhu: A Retrieval-Augmented Voice-Based Agricultural Advisory Call Center for Bengali Farmers

Mohd Ruhul Ameen, Akif Islam, Farjana Aktar et al.

In Bangladesh, many farmers continue to face challenges in accessing timely, expert-level agricultural guidance. This paper presents KrishokBondhu, a voice-enabled, call-centre-integrated advisory platform built on a Retrieval-Augmented Generation (RAG) framework, designed specifically for Bengali-speaking farmers. The system aggregates authoritative agricultural handbooks, extension manuals, and NGO publications; applies Optical Character Recognition (OCR) and document-parsing pipelines to digitize and structure the content; and indexes this corpus in a vector database for efficient semantic retrieval. Through a simple phone-based interface, farmers can call the system to receive real-time, context-aware advice: speech-to-text converts the Bengali query, the RAG module retrieves relevant content, a large language model (Gemma 3-4B) generates a context-grounded response, and text-to-speech delivers the answer in natural spoken Bengali. In a pilot evaluation, KrishokBondhu produced high-quality responses for 72.7% of diverse agricultural queries covering crop management, disease control, and cultivation practices. Compared to the KisanQRS benchmark, the system achieved a composite score of 4.53 (vs. 3.13) on a 5-point scale, a 44.7% improvement, with especially large gains in contextual richness (+367%) and completeness (+100.4%), while maintaining comparable relevance and technical specificity. Semantic similarity analysis further revealed a strong correlation between retrieved context and answer quality, emphasizing the importance of grounding generative responses in curated documentation. KrishokBondhu demonstrates the feasibility of integrating call-centre accessibility, multilingual voice interaction, and modern RAG techniques to deliver expert-level agricultural guidance to remote Bangladeshi farmers, paving the way toward a fully AI-driven agricultural advisory ecosystem.

CLOct 20, 2025
How News Feels: Understanding Affective Bias in Multilingual Headlines for Human-Centered Media Design

Mohd Ruhul Ameen, Akif Islam, Abu Saleh Musa Miah et al.

News media often shape the public mood not only by what they report but by how they frame it. The same event can appear calm in one outlet and alarming in another, reflecting subtle emotional bias in reporting. Negative or emotionally charged headlines tend to attract more attention and spread faster, which in turn encourages outlets to frame stories in ways that provoke stronger reactions. This research explores that tendency through large-scale emotion analysis of Bengali news. Using zero-shot inference with Gemma-3 4B, we analyzed 300000 Bengali news headlines and their content to identify the dominant emotion and overall tone of each. The findings reveal a clear dominance of negative emotions, particularly anger, fear, and disappointment, and significant variation in how similar stories are emotionally portrayed across outlets. Based on these insights, we propose design ideas for a human-centered news aggregator that visualizes emotional cues and helps readers recognize hidden affective framing in daily news.

CVOct 20, 2025
From Pixels to People: Satellite-Based Mapping and Quantification of Riverbank Erosion and Lost Villages in Bangladesh

M Saifuzzaman Rafat, Mohd Ruhul Ameen, Akif Islam et al.

The great rivers of Bangladesh, arteries of commerce and sustenance, are also agents of relentless destruction. Each year, they swallow whole villages and vast tracts of farmland, erasing communities from the map and displacing thousands of families. To track this slow-motion catastrophe has, until now, been a Herculean task for human analysts. Here we show how a powerful general-purpose vision model, the Segment Anything Model (SAM), can be adapted to this task with remarkable precision. To do this, we assembled a new dataset - a digital chronicle of loss compiled from historical Google Earth imagery of Bangladesh's most vulnerable regions, including Mokterer Char Union, Kedarpur Union, Balchipara village, and Chowhali Upazila, from 2003 to 2025. Crucially, this dataset is the first to include manually annotated data on the settlements that have vanished beneath the water. Our method first uses a simple color-channel analysis to provide a rough segmentation of land and water, and then fine-tunes SAM's mask decoder to recognize the subtle signatures of riverbank erosion. The resulting model demonstrates a keen eye for this destructive process, achieving a mean Intersection over Union of 86.30% and a Dice score of 92.60% - a performance that significantly surpasses traditional methods and off-the-shelf deep learning models. This work delivers three key contributions: the first annotated dataset of disappeared settlements in Bangladesh due to river erosion; a specialized AI model fine-tuned for this critical task; and a method for quantifying land loss with compelling visual evidence. Together, these tools provide a powerful new lens through which policymakers and disaster management agencies can monitor erosion, anticipate its trajectory, and ultimately protect the vulnerable communities in its path.

CLOct 19, 2025
Parameter-Efficient Fine-Tuning for Low-Resource Languages: A Comparative Study of LLMs for Bengali Hate Speech Detection

Akif Islam, Mohd Ruhul Ameen

Bengali social media platforms have witnessed a sharp increase in hate speech, disproportionately affecting women and adolescents. While datasets such as BD-SHS provide a basis for structured evaluation, most prior approaches rely on either computationally costly full-model fine-tuning or proprietary APIs. This paper presents the first application of Parameter-Efficient Fine-Tuning (PEFT) for Bengali hate speech detection using LoRA and QLoRA. Three instruction-tuned large language models - Gemma-3-4B, Llama-3.2-3B, and Mistral-7B - were fine-tuned on the BD-SHS dataset of 50,281 annotated comments. Each model was adapted by training fewer than 1% of its parameters, enabling experiments on a single consumer-grade GPU. The results show that Llama-3.2-3B achieved the highest F1-score of 92.23%, followed by Mistral-7B at 88.94% and Gemma-3-4B at 80.25%. These findings establish PEFT as a practical and replicable strategy for Bengali and related low-resource languages.

CRSep 16, 2025
A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks

S M Asif Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen et al.

Prompt injection attacks represent a major vulnerability in Large Language Model (LLM) deployments, where malicious instructions embedded in user inputs can override system prompts and induce unintended behaviors. This paper presents a novel multi-agent defense framework that employs specialized LLM agents in coordinated pipelines to detect and neutralize prompt injection attacks in real-time. We evaluate our approach using two distinct architectures: a sequential chain-of-agents pipeline and a hierarchical coordinator-based system. Our comprehensive evaluation on 55 unique prompt injection attacks, grouped into 8 categories and totaling 400 attack instances across two LLM platforms (ChatGLM and Llama2), demonstrates significant security improvements. Without defense mechanisms, baseline Attack Success Rates (ASR) reached 30% for ChatGLM and 20% for Llama2. Our multi-agent pipeline achieved 100% mitigation, reducing ASR to 0% across all tested scenarios. The framework demonstrates robustness across multiple attack categories including direct overrides, code execution attempts, data exfiltration, and obfuscation techniques, while maintaining system functionality for legitimate queries.