SEApr 12
The Code Whisperer: LLM and Graph-Based AI for Smell and Vulnerability ResolutionMohammad Baqar, Raji Rustamov, Alexander Hughes
Code smells and software vulnerabilities both increase maintenance cost, yet they are often handled by separate tools that miss structural context and produce noisy warnings. This paper presents The Code Whisperer, a hybrid framework that combines graph-based program analysis with large language models to detect, explain, and repair maintainability and security issues within a unified workflow. The method aligns Abstract Syntax Trees (ASTs), Control Flow Graphs (CFGs), Program Dependency Graphs (PDGs), and token-level code embeddings so that structural and semantic signals can be learned jointly. We evaluate the framework on multi-language datasets and compare it with rule-based analyzers and single-model baselines. The results indicate that the hybrid design improves detection performance and produces more useful repair suggestions than either graph-only or language-model-only approaches. We also examine explainability and CI/CD integration as practical requirements for adopting AI-assisted code review in everyday software engineering workflows.
SESep 9, 2024
The Future of Software Testing: AI-Powered Test Case Generation and ValidationMohammad Baqar, Rajat Khanda
Software testing is a crucial phase in the software development lifecycle (SDLC), ensuring that products meet necessary functional, performance, and quality benchmarks before release. Despite advancements in automation, traditional methods of generating and validating test cases still face significant challenges, including prolonged timelines, human error, incomplete test coverage, and high costs of manual intervention. These limitations often lead to delayed product launches and undetected defects that compromise software quality and user satisfaction. The integration of artificial intelligence (AI) into software testing presents a promising solution to these persistent challenges. AI-driven testing methods automate the creation of comprehensive test cases, dynamically adapt to changes, and leverage machine learning to identify high-risk areas in the codebase. This approach enhances regression testing efficiency while expanding overall test coverage. Furthermore, AI-powered tools enable continuous testing and self-healing test cases, significantly reducing manual oversight and accelerating feedback loops, ultimately leading to faster and more reliable software releases. This paper explores the transformative potential of AI in improving test case generation and validation, focusing on its ability to enhance efficiency, accuracy, and scalability in testing processes. It also addresses key challenges associated with adapting AI for testing, including the need for high quality training data, ensuring model transparency, and maintaining a balance between automation and human oversight. Through case studies and examples of real-world applications, this paper illustrates how AI can significantly enhance testing efficiency across both legacy and modern software systems.
SEAug 10, 2024
Balancing Innovation and Ethics in AI-Driven Software DevelopmentMohammad Baqar
This paper critically examines the ethical implications of integrating AI tools like GitHub Copilot and ChatGPT into the software development process. It explores issues such as code ownership, bias, accountability, privacy, and the potential impact on the job market. While these AI tools offer significant benefits in terms of productivity and efficiency, they also introduce complex ethical challenges. The paper argues that addressing these challenges is essential to ensuring that AI's integration into software development is both responsible and beneficial to society
ROJul 25, 2025
Extending Group Relative Policy Optimization to Continuous Control: A Theoretical Framework for Robotic Reinforcement LearningRajat Khanda, Mohammad Baqar, Sambuddha Chakrabarti et al.
Group Relative Policy Optimization (GRPO) has shown promise in discrete action spaces by eliminating value function dependencies through group-based advantage estimation. However, its application to continuous control remains unexplored, limiting its utility in robotics where continuous actions are essential. This paper presents a theoretical framework extending GRPO to continuous control environments, addressing challenges in high-dimensional action spaces, sparse rewards, and temporal dynamics. Our approach introduces trajectory-based policy clustering, state-aware advantage estimation, and regularized policy updates designed for robotic applications. We provide theoretical analysis of convergence properties and computational complexity, establishing a foundation for future empirical validation in robotic systems including locomotion and manipulation tasks.
SEApr 25, 2025
Self-Healing Software Systems: Lessons from Nature, Powered by AIMohammad Baqar, Rajat Khanda, Saba Naqvi
As modern software systems grow in complexity and scale, their ability to autonomously detect, diagnose, and recover from failures becomes increasingly vital. Drawing inspiration from biological healing - where the human body detects damage, signals the brain, and activates targeted recovery - this paper explores the concept of self-healing software driven by artificial intelligence. We propose a novel framework that mimics this biological model system observability tools serve as sensory inputs, AI models function as the cognitive core for diagnosis and repair, and healing agents apply targeted code and test modifications. By combining log analysis, static code inspection, and AI-driven generation of patches or test updates, our approach aims to reduce downtime, accelerate debugging, and enhance software resilience. We evaluate the effectiveness of this model through case studies and simulations, comparing it against traditional manual debugging and recovery workflows. This work paves the way toward intelligent, adaptive and self-reliant software systems capable of continuous healing, akin to living organisms.
CLFeb 14, 2025
Hallucinations and Truth: A Comprehensive Accuracy Evaluation of RAG, LoRA and DoRAMohammad Baqar, Rajat Khanda
Recent advancements in Generative AI have significantly improved the efficiency and adaptability of natural language processing (NLP) systems, particularly through Retrieval-Augmented Generation (RAG), Low-Rank Adaptation (LoRA), and Weight-Decomposed Low-Rank Adaptation (DoRA). RAG integrates external knowledge to enhance factual consistency in generative outputs, while LoRA enables parameter-efficient fine-tuning of large language models (LLMs). DoRA further refines this process by optimizing fine-tuning through adaptive parameter ranking and domain-aware weight adjustments, improving learning efficiency while maintaining inference performance. This paper presents a large-scale empirical evaluation of RAG, LoRA, and DoRA, with model fine-tuning and generation performance assessed on 20,000 FAQ-based queries, while the knowledge base spans 400,000 entries. The study analyzes key performance metrics such as accuracy, relevance, and inference latency. Experimental results demonstrate that DoRA achieves the highest accuracy (90.1%), relevance score (0.88), and lowest latency (110 ms per query), outperforming both LoRA and RAG in real-world, domain-specific generative AI applications. Furthermore, this study examines the trade-offs between fine-tuning efficiency, computational cost, and real-time adaptability across different models. Findings highlight RAG's effectiveness in knowledge grounding, LoRA's cost-efficient domain adaptation, and DoRA's ability to balance fine-tuning efficiency with model precision. These insights provide practical guidance for deploying AI-driven generative systems in accuracy-critical domains such as healthcare, finance, and legal services, ensuring scalability, reliability, and optimal performance in dynamic environments.
SEJan 5
The Rise of Agentic Testing: Multi-Agent Systems for Robust Software Quality AssuranceSaba Naqvi, Mohammad Baqar, Nawaz Ali Mohammad
Software testing has progressed toward intelligent automation, yet current AI-based test generators still suffer from static, single-shot outputs that frequently produce invalid, redundant, or non-executable tests due to the lack of execution aware feedback. This paper introduces an agentic multi-model testing framework a closed-loop, self-correcting system in which a Test Generation Agent, an Execution and Analysis Agent, and a Review and Optimization Agent collaboratively generate, execute, analyze, and refine tests until convergence. By using sandboxed execution, detailed failure reporting, and iterative regeneration or patching of failing tests, the framework autonomously improves test quality and expands coverage. Integrated into a CI/CD-compatible pipeline, it leverages reinforcement signals from coverage metrics and execution outcomes to guide refinement. Empirical evaluations on microservice based applications show up to a 60% reduction in invalid tests, 30% coverage improvement, and significantly reduced human effort compared to single-model baselines demonstrating that multi-agent, feedback-driven loops can evolve software testing into an autonomous, continuously learning quality assurance ecosystem for self-healing, high-reliability codebases.
SEOct 9, 2025
RAG4Tickets: AI-Powered Ticket Resolution via Retrieval-Augmented Generation on JIRA and GitHub DataMohammad Baqar
Modern software teams frequently encounter delays in resolving recurring or related issues due to fragmented knowledge scattered across JIRA tickets, developer discussions, and GitHub pull requests (PRs). To address this challenge, we propose a Retrieval-Augmented Generation (RAG) framework that integrates Sentence-Transformers for semantic embeddings with FAISS-based vector search to deliver context-aware ticket resolution recommendations. The approach embeds historical JIRA tickets, user comments, and linked PR metadata to retrieve semantically similar past cases, which are then synthesized by a Large Language Model (LLM) into grounded and explainable resolution suggestions. The framework contributes a unified pipeline linking JIRA and GitHub data, an embedding and FAISS indexing strategy for heterogeneous software artifacts, and a resolution generation module guided by retrieved evidence. Experimental evaluation using precision, recall, resolution time reduction, and developer acceptance metrics shows that the proposed system significantly improves resolution accuracy, fix quality, and knowledge reuse in modern DevOps environments.
SEAug 22, 2025
Breaking Barriers in Software Testing: The Power of AI-Driven AutomationSaba Naqvi, Mohammad Baqar
Software testing remains critical for ensuring reliability, yet traditional approaches are slow, costly, and prone to gaps in coverage. This paper presents an AI-driven framework that automates test case generation and validation using natural language processing (NLP), reinforcement learning (RL), and predictive models, embedded within a policy-driven trust and fairness model. The approach translates natural language requirements into executable tests, continuously optimizes them through learning, and validates outcomes with real-time analysis while mitigating bias. Case studies demonstrate measurable gains in defect detection, reduced testing effort, and faster release cycles, showing that AI-enhanced testing improves both efficiency and reliability. By addressing integration and scalability challenges, the framework illustrates how AI can shift testing from a reactive, manual process to a proactive, adaptive system that strengthens software quality in increasingly complex environments.
SEAug 16, 2025
AI-Augmented CI/CD Pipelines: From Code Commit to Production with Autonomous DecisionsMohammad Baqar, Saba Naqvi, Rajat Khanda
Modern software delivery has accelerated from quarterly releases to multiple deployments per day. While CI/CD tooling has matured, human decision points interpreting flaky tests, choosing rollback strategies, tuning feature flags, and deciding when to promote a canary remain major sources of latency and operational toil. We propose AI-Augmented CI/CD Pipelines, where large language models (LLMs) and autonomous agents act as policy-bounded co-pilots and progressively as decision makers. We contribute: (1) a reference architecture for embedding agentic decision points into CI/CD, (2) a decision taxonomy and policy-as-code guardrail pattern, (3) a trust-tier framework for staged autonomy, (4) an evaluation methodology using DevOps Research and Assessment ( DORA) metrics and AI-specific indicators, and (5) a detailed industrial-style case study migrating a React 19 microservice to an AI-augmented pipeline. We discuss ethics, verification, auditability, and threats to validity, and chart a roadmap for verifiable autonomy in production delivery systems.