SEMay 19
CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent TechnologyZeeshan Rasheed, Muhammad Waseem, Kai-Kristian Kemell et al.
Context: LLM-based multi-agent systems enable automation and decision support in software development, yet existing studies rely on benchmark datasets offering only binary pass-or-fail results, limiting insight into real-world applicability. Objective: This study empirically investigates the potential and limitations of LLM-based agents in autonomous software development tasks. Method: A two-phase approach was employed: developing a multi-agent system, CodePori, for automated code generation, and conducting participant-based evaluation to assess practical performance. Results: Participant feedback reveals key strengths, challenges, and areas for improvement in LLM-based multi-agent systems, highlighting aspects missed by standard code-generation benchmarks. Conclusions: While LLM-based multi-agent systems show potential for large-scale software development, successful integration requires addressing challenges such as memory limitations, hallucinations, and code smells, alongside a practitioner-centric perspective.
SEOct 21, 2024Code
Developing Retrieval Augmented Generation (RAG) based LLM Systems from PDFs: An Experience ReportAyman Asad Khan, Md Toufique Hasan, Kai Kristian Kemell et al.
This paper presents an experience report on the development of Retrieval Augmented Generation (RAG) systems using PDF documents as the primary data source. The RAG architecture combines generative capabilities of Large Language Models (LLMs) with the precision of information retrieval. This approach has the potential to redefine how we interact with and augment both structured and unstructured knowledge in generative models to enhance transparency, accuracy, and contextuality of responses. The paper details the end-to-end pipeline, from data collection, preprocessing, to retrieval indexing and response generation, highlighting technical challenges and practical solutions. We aim to offer insights to researchers and practitioners developing similar systems using two distinct approaches: OpenAI's Assistant API with GPT Series and Llama's open-source models. The practical implications of this research lie in enhancing the reliability of generative AI systems in various sectors where domain-specific knowledge and real-time information retrieval is important. The Python code used in this work is also available at: https://github.com/GPT-Laboratory/RAG-LLM-Development-Guidebook-from-PDFs.
SEDec 11, 2025
Vibe Coding in Practice: Flow, Technical Debt, and Guidelines for Sustainable UseMuhammad Waseem, Aakash Ahmad, Kai-Kristian Kemell et al.
Vibe Coding (VC) is a form of software development assisted by generative AI, in which developers describe the intended functionality or logic via natural language prompts, and the AI system generates the corresponding source code. VC can be leveraged for rapid prototyping or developing the Minimum Viable Products (MVPs); however, it may introduce several risks throughout the software development life cycle. Based on our experience from several internally developed MVPs and a review of recent industry reports, this article analyzes the flow-debt tradeoffs associated with VC. The flow-debt trade-off arises when the seamless code generation occurs, leading to the accumulation of technical debt through architectural inconsistencies, security vulnerabilities, and increased maintenance overhead. These issues originate from process-level weaknesses, biases in model training data, a lack of explicit design rationale, and a tendency to prioritize quick code generation over human-driven iterative development. Based on our experiences, we identify and explain how current model, platform, and hardware limitations contribute to these issues, and propose countermeasures to address them, informing research and practice towards more sustainable VC approaches.