Karthik Shivashankar

h-index6

9papers

29citations

Novelty31%

AI Score26

Ranked #159,808 of 194,257 authors (top 82%)#1,883 in SE (top 62%)

9 Papers

8.5AIAug 17, 2024

Maintainability Challenges in ML: A Systematic Literature Review

Karthik Shivashankar, Antonio Martini

Background: As Machine Learning (ML) advances rapidly in many fields, it is being adopted by academics and businesses alike. However, ML has a number of different challenges in terms of maintenance not found in traditional software projects. Identifying what causes these maintainability challenges can help mitigate them early and continue delivering value in the long run without degrading ML performance. Aim: This study aims to identify and synthesise the maintainability challenges in different stages of the ML workflow and understand how these stages are interdependent and impact each other's maintainability. Method: Using a systematic literature review, we screened more than 13000 papers, then selected and qualitatively analysed 56 of them. Results: (i) a catalogue of maintainability challenges in different stages of Data Engineering, Model Engineering workflows and the current challenges when building ML systems are discussed; (ii) a map of 13 maintainability challenges to different interdependent stages of ML that impact the overall workflow; (iii) Provided insights to developers of ML tools and researchers. Conclusions: In this study, practitioners and organisations will learn about maintainability challenges and their impact at different stages of ML workflow. This will enable them to avoid pitfalls and help to build a maintainable ML system. The implications and challenges will also serve as a basis for future research to strengthen our understanding of the ML system's maintainability.

1.8SEAug 17, 2024

Better Python Programming for all: With the focus on Maintainability

Karthik Shivashankar, Antonio Martini

This study aims to enhance the maintainability of code generated by Large Language Models (LLMs), with a focus on the Python programming language. As the use of LLMs for coding assistance grows, so do concerns about the maintainability of the code they produce. Previous research has mainly concentrated on the functional accuracy and testing success of generated code, overlooking aspects of maintainability. Our approach involves the use of a specifically designed dataset for training and evaluating the model, ensuring a thorough assessment of code maintainability. At the heart of our work is the fine-tuning of an LLM for code refactoring, aimed at enhancing code readability, reducing complexity, and improving overall maintainability. After fine-tuning an LLM to prioritize code maintainability, our evaluations indicate that this model significantly improves code maintainability standards, suggesting a promising direction for the future of AI-assisted software development.

1.8SEAug 17, 2024

Identifying Technical Debt and Its Types Across Diverse Software Projects Issues

Karthik Shivashankar, Mili Orucevic, Maren Maritsdatter Kruke et al.

Technical Debt (TD) identification in software projects issues is crucial for maintaining code quality, reducing long-term maintenance costs, and improving overall project health. This study advances TD classification using transformer-based models, addressing the critical need for accurate and efficient TD identification in large-scale software development. Our methodology employs multiple binary classifiers for TD and its type, combined through ensemble learning, to enhance accuracy and robustness in detecting various forms of TD. We train and evaluate these models on a comprehensive dataset from GitHub Archive Issues (2015-2024), supplemented with industrial data validation. We demonstrate that in-project fine-tuned transformer models significantly outperform task-specific fine-tuned models in TD classification, highlighting the importance of project-specific context in accurate TD identification. Our research also reveals the superiority of specialized binary classifiers over multi-class models for TD and its type identification, enabling more targeted debt resolution strategies. A comparative analysis shows that the smaller DistilRoBERTa model is more effective than larger language models like GPTs for TD classification tasks, especially after fine-tuning, offering insights into efficient model selection for specific TD detection tasks. The study also assesses generalization capabilities using metrics such as MCC, AUC ROC, Recall, and F1 score, focusing on model effectiveness, fine-tuning impact, and relative performance. By validating our approach on out-of-distribution and real-world industrial datasets, we ensure practical applicability, addressing the diverse nature of software projects.

3.4SEApr 15, 2025Code

TD-Suite: All Batteries Included Framework for Technical Debt Classification

Karthik Shivashankar, Antonio Martini

Recognizing that technical debt is a persistent and significant challenge requiring sophisticated management tools, TD-Suite offers a comprehensive software framework specifically engineered to automate the complex task of its classification within software projects. It leverages the advanced natural language understanding of state-of-the-art transformer models to analyze textual artifacts, such as developer discussions in issue reports, where subtle indicators of debt often lie hidden. TD-Suite provides a seamless end-to-end pipeline, managing everything from initial data ingestion and rigorous preprocessing to model training, thorough evaluation, and final inference. This allows it to support both straightforward binary classification (debt or no debt) and more valuable, identifying specific categories like code, design, or documentation debt, thus enabling more targeted management strategies. To ensure the generated models are robust and perform reliably on real-world, often imbalanced, datasets, TD-Suite incorporates critical training methodologies: k-fold cross-validation assesses generalization capability, early stopping mechanisms prevent overfitting to the training data, and class weighting strategies effectively address skewed data distributions. Beyond core functionality, and acknowledging the growing importance of sustainability, the framework integrates tracking and reporting of carbon emissions associated with the computationally intensive model training process. It also features a user-friendly Gradio web interface in a Docker container setup, simplifying model interaction, evaluation, and inference.

3.4SEApr 15, 2025

Scalability and Maintainability Challenges and Solutions in Machine Learning: Systematic Literature Review

Karthik Shivashankar, Ghadi S. Al Hajj, Antonio Martini

This systematic literature review examines the critical challenges and solutions related to scalability and maintainability in Machine Learning (ML) systems. As ML applications become increasingly complex and widespread across industries, the need to balance system scalability with long-term maintainability has emerged as a significant concern. This review synthesizes current research and practices addressing these dual challenges across the entire ML life-cycle, from data engineering to model deployment in production. We analyzed 124 papers to identify and categorize 41 maintainability challenges and 13 scalability challenges, along with their corresponding solutions. Our findings reveal intricate inter dependencies between scalability and maintainability, where improvements in one often impact the other. The review is structured around six primary research questions, examining maintainability and scalability challenges in data engineering, model engineering, and ML system development. We explore how these challenges manifest differently across various stages of the ML life-cycle. This comprehensive overview offers valuable insights for both researchers and practitioners in the field of ML systems. It aims to guide future research directions, inform best practices, and contribute to the development of more robust, efficient, and sustainable ML applications across various domains.

8.0SEJan 30, 2025Code

MLScent A tool for Anti-pattern detection in ML projects

Karthik Shivashankar, Antonio Martini

Machine learning (ML) codebases face unprecedented challenges in maintaining code quality and sustainability as their complexity grows exponentially. While traditional code smell detection tools exist, they fail to address ML-specific issues that can significantly impact model performance, reproducibility, and maintainability. This paper introduces MLScent, a novel static analysis tool that leverages sophisticated Abstract Syntax Tree (AST) analysis to detect anti-patterns and code smells specific to ML projects. MLScent implements 76 distinct detectors across major ML frameworks including TensorFlow (13 detectors), PyTorch (12 detectors), Scikit-learn (9 detectors), and Hugging Face (10 detectors), along with data science libraries like Pandas and NumPy (8 detectors each). The tool's architecture also integrates general ML smell detection (16 detectors), and specialized analysis for data preprocessing and model training workflows. Our evaluation demonstrates MLScent's effectiveness through both quantitative classification metrics and qualitative assessment via user studies feedback with ML practitioners. Results show high accuracy in identifying framework-specific anti-patterns, data handling issues, and general ML code smells across real-world projects.

3.4SEApr 16, 2025

Unravelling Technical debt topics through Time, Programming Languages and Repository

Karthik Shivashankar, Antonio Martini

This study explores the dynamic landscape of Technical Debt (TD) topics in software engineering by examining its evolution across time, programming languages, and repositories. Despite the extensive research on identifying and quantifying TD, there remains a significant gap in understanding the diversity of TD topics and their temporal development. To address this, we have conducted an explorative analysis of TD data extracted from GitHub issues spanning from 2015 to September 2023. We employed BERTopic for sophisticated topic modelling. This study categorises the TD topics and tracks their progression over time. Furthermore, we have incorporated sentiment analysis for each identified topic, providing a deeper insight into the perceptions and attitudes associated with these topics. This offers a more nuanced understanding of the trends and shifts in TD topics through time, programming language, and repository.

3.4SEApr 15, 2025

QualiTagger: Automating software quality detection in issue trackers

Karthik Shivashankar, Rafael Capilla, Maren Maritsdatter Kruke et al.

A systems quality is a major concern for development teams when it evolve. Understanding the effects of a loss of quality in the codebase is crucial to avoid side effects like the appearance of technical debt. Although the identification of these qualities in software requirements described in natural language has been investigated, most of the results are often not applicable in practice, and rely on having been validated on small datasets and limited amount of projects. For many years, machine learning (ML) techniques have been proved as a valid technique to identify and tag terms described in natural language. In order to advance previous works, in this research we use cutting edge models like Transformers, together with a vast dataset mined and curated from GitHub, to identify what text is usually associated with different quality properties. We also study the distribution of such qualities in issue trackers from openly accessible software repositories, and we evaluate our approach both with students from a software engineering course and with its application to recognize security labels in industry.

3.6CVMar 31, 2025

Computer Vision and Deep Learning for 4D Augmented Reality

Karthik Shivashankar

The prospect of 4D video in Extended Reality (XR) platform is huge and exciting, it opens a whole new way of human computer interaction and the way we perceive the reality and consume multimedia. In this thesis, we have shown that feasibility of rendering 4D video in Microsoft mixed reality platform. This enables us to port any 3D performance capture from CVSSP into XR product like the HoloLens device with relative ease. However, if the 3D model is too complex and is made up of millions of vertices, the data bandwidth required to port the model is a severe limitation with the current hardware and communication system. Therefore, in this project we have also developed a compact representation of both shape and appearance of the 4d video sequence using deep learning models to effectively learn the compact representation of 4D video sequence and reconstruct it without affecting the shape and appearance of the video sequence.