Amir AL-Maamari

CR
h-index1
3papers
12citations
Novelty43%
AI Score38

3 Papers

30.3CRMar 10
Why LLMs Fail: A Failure Analysis and Partial Success Measurement for Automated Security Patch Generation

Amir Al-Maamari

Large Language Models (LLMs) show promise for Automated Program Repair (APR), yet their effectiveness on security vulnerabilities remains poorly characterized. This study analyzes 319 LLM-generated security patchesacross 64 Java vulnerabilities from the Vul4J benchmark. Using tri-axis evaluation (compilation, security via PoV tests, functionality via test suites), the analysis reveals that only 24.8% of patches achieve full correctness, while 51.4% fail both security and functionality. The dominant failure mode is semantic misunderstanding: LLMs produce syntactically valid code but apply incorrect repair strategies. The proposed Security Repair Score (SRS) quantifies this gap, showing LLMs preserve functionality (mean 0.832) but struggle with security (mean 0.251). Vulnerability type strongly predicts difficulty, with fix rates ranging from 0% (input validation) to 45% (infinite loop). These findings demonstrate that LLM security patches require rigorous validation before deployment.

CYFeb 25, 2025
Between Innovation and Oversight: A Cross-Regional Study of AI Risk Management Frameworks in the EU, U.S., UK, and China

Amir Al-Maamari

As artificial intelligence (AI) technologies increasingly enter important sectors like healthcare, transportation, and finance, the development of effective governance frameworks is crucial for dealing with ethical, security, and societal risks. This paper conducts a comparative analysis of AI risk management strategies across the European Union (EU), United States (U.S.), United Kingdom (UK), and China. A multi-method qualitative approach, including comparative policy analysis, thematic analysis, and case studies, investigates how these regions classify AI risks, implement compliance measures, structure oversight, prioritize transparency, and respond to emerging innovations. Examples from high-risk contexts like healthcare diagnostics, autonomous vehicles, fintech, and facial recognition demonstrate the advantages and limitations of different regulatory models. The findings show that the EU implements a structured, risk-based framework that prioritizes transparency and conformity assessments, while the U.S. uses decentralized, sector-specific regulations that promote innovation but may lead to fragmented enforcement. The flexible, sector-specific strategy of the UK facilitates agile responses but may lead to inconsistent coverage across domains. China's centralized directives allow rapid large-scale implementation while constraining public transparency and external oversight. These insights show the necessity for AI regulation that is globally informed yet context-sensitive, aiming to balance effective risk management with technological progress. The paper concludes with policy recommendations and suggestions for future research aimed at enhancing effective, adaptive, and inclusive AI governance globally.

CRSep 22, 2025
Can You Trust Your Copilot? A Privacy Scorecard for AI Coding Assistants

Amir AL-Maamari

The rapid integration of AI-powered coding assistants into developer workflows has raised significant privacy and trust concerns. As developers entrust proprietary code to services like OpenAI's GPT, Google's Gemini, and GitHub Copilot, the unclear data handling practices of these tools create security and compliance risks. This paper addresses this challenge by introducing and applying a novel, expert-validated privacy scorecard. The methodology involves a detailed analysis of four document types; from legal policies to external audits; to score five leading assistants against 14 weighted criteria. A legal expert and a data protection officer refined these criteria and their weighting. The results reveal a distinct hierarchy of privacy protections, with a 20-point gap between the highest- and lowest-ranked tools. The analysis uncovers common industry weaknesses, including the pervasive use of opt-out consent for model training and a near-universal failure to filter secrets from user prompts proactively. The resulting scorecard provides actionable guidance for developers and organizations, enabling evidence-based tool selection. This work establishes a new benchmark for transparency and advocates for a shift towards more user-centric privacy standards in the AI industry.