Nasir U. Eisty

SE
7papers
37citations
Novelty21%
AI Score52

7 Papers

27.9SEMar 16Code
Self-Admitted Technical Debt in Scientific Software: Prioritization, Sentiment, and Propagation Across Artifacts

Eric L. Melin, Nasir U. Eisty, Gregory R. Watson et al.

Self-admitted technical debt (SATD) impairs scientific software (SSW), yet its prioritization, sentiment, persistence, and propagation remains underexplored. Understanding how SSW developers express, and address SATD is crucial for improving SSW maintenance, and tooling. This study investigates how SATD types and artifacts in SSW are prioritized, how sentiment relates to urgency, SATD removal and resolution rates, and the extent to which SATD propagates across artifacts. We analyzed nine SSW repositories using a SATD classification model and a semantic embedding-based prioritization heuristic. SATD was examined across multiple artifacts, with sentiment assessed via a fine-tuned transformer. Propagation was traced, priority scores compared to static analysis, and removal and resolution rates quantified. SATD in comments, commits, and pull requests receive higher priority than SATD in issues, with negative sentiment amplifying urgency. Resolution and removal rates lag behind open-source software (OSS) averages. Most SATD remains confined to the originating artifact, but longer propagation chains are rare and correlate with higher priority, highlighting persistent and high impact debt. Prioritization is influenced by artifact type and sentiment, while low removal and resolution rates signal persistent debt. Cross-artifact propagation marks high priority, unresolved SATD, providing empirical guidance for targeted monitoring, review prioritization, and tool supported maintenance in SSW.

69.3SEApr 3Code
Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models

Eric L. Melin, Adam J. Torek, Nasir U. Eisty et al.

Context: Large Language Models (LLMs) like GPT-5 and LLaMA-405b exhibit advanced code generation abilities, but their deployment demands substantial computation resources and energy. Quantization can reduce memory footprint and hardware requirements, yet may degrade code quality. Objective: This study investigates code generation performance of smaller LLMs, examines the effect of quantization, and identifies common code quality issues as a proof of concepts (PoC). Method: Four open-source LLMs are evaluated on Python benchmarks using code similarity metrics, with an analysis on 8-bit and 4-bit quantization, alongside static code quality assessment. Results: While smaller LLMs can generate functional code, benchmark performance is limited. Quantization impacts are variable, and generated code exhibits quality and maintainability concerns. Conclusions: LLM-generated code should be carefully validated before integration into software projects.

10.5SEMar 10Code
MALTA: Maintenance-Aware Technical Lag, Estimation to Address Software Abandonment

Shane K. Panter, Nasir U. Eisty

Context: Open-source ecosystems rely on sustained package maintenance. When maintenance slows or stops, Technical Lag (TL), the gap between installed and latest dependency versions accumulates, creating security and sustainability risks. However, some existing TL metrics, such as Version Lag, struggle to distinguish between actively maintained and abandoned packages, leading to a systematic underestimation of risk. Objective: We investigate the relationship between Version Lag and software abandonment by (i) identifying which repository-level signals reliably distinguish sustained maintenance from long-term decline, (ii) quantifying how Version Lag magnitude and persistence differ across maintenance states, and (iii) evaluating how maintenance-aware metrics change the identification of high-risk dependencies. Method: We introduce Maintenance-Aware Lag and Technical Abandonment (MALTA), a scoring framework comprising three metrics: Development Activity Score (DAS), Maintainer Responsiveness Score (MRS), and Repository Metadata Viability Score (RMVS). We evaluate MALTA on a dataset of 11,047 Debian packages linked to upstream GitHub repositories, encompassing 1.7 million commits and 4.2 million pull requests. Results: MALTA achieves AUC = 0.783 for classifying active versus declining maintenance. Most significantly, 62.2% of packages classified as "Low Risk" by Version Lag alone are reclassified as "High Risk" when MALTA signals are incorporated. These discordant packages average 2019 days since their last commit, with 9.8% having archived repositories. Conclusions: Version Lag metrics systematically miss abandoned packages, a blind spot affecting the majority of dependencies in distribution ecosystems. MALTA identifies a substantial discordant population invisible to Version Lag by distinguishing resolvable lag from terminal lag caused by upstream abandonment.

5.7SEMay 5Code
Exploring Sustainability in Scientific Software through Code Quality & Test Coverage Metrics

Sheikh Md. Mushfiqur Rahman, Gregory R. Watson, Nasir U. Eisty

Context: Scientific open-source software (SciOSS) plays a foundational role in research and engineering, yet its long-term sustainability has often been overlooked and remains a significant concern. Objective: This study investigates the long-term sustainability of SciOSS through code and test quality metrics. Method: We analyze CASS Software Portfolio projects, classifying them by sustainability and comparing their code structure, test coverage, and links between code quality and testing across the dataset. Results: Sustainable projects show higher, more consistent test coverage and clearer code-test correlations, while unsustainable ones show weaker patterns. Overall, test coverage is low in scientific software, and high complexity and coupling reduce testability. Conclusion: In this study, we present a practical, data-driven approach for assessing sustainability in scientific software, offering a foundation for evaluating long-term software health and supporting future efforts in quality assurance and sustainability monitoring.

3.0SEApr 26Code
Characterizing the Usefulness of Code Review Comments in Scientific Software for Software Quality and Scientific Rigor

Sharif Ahmed, Nasir U. Eisty

Context: Innovation thrives on scientific software, with useful code review feedback enhancing its correctness and impact. However, unlike general-purpose commercial and open-source software, the usefulness of code review feedback (CR comment) in scientific software remains largely unstudied. Objective: This paper aims to characterize the usefulness of CR comment in scientific opens ource software (Sci-OSS), leveraging existing research on useful CR comment. Method: To achieve this objective, we mine successful Sci-OSS from GitHub, analyze their CR comments with usefulness related features, and compare the findings from prior research on general-purpose commercial and open-source CR comments. Results: The investigation on the usefulness of CR comments in SciOSS confirms many characteristics that prior research identified in general-purpose software. For example, subjective or negative CR comments remain not useful for the Sci-OSS. We also find CR comments which receive negative emoji reactions have a very small correlation with not useful comments, whereas the positive emojis show mixed correlations. Importantly, 6-33% CR comments in Sci-OSS are not useful in our mined repositories. Conclusions: Our investigation into Sci-OSS extends findings from CR comments' usefulness research on general-purpose software, benefiting developers, scientists, and researchers in the Sci-OSS community.

SEApr 19, 2022
Software Engineering Approaches for TinyML based IoT Embedded Vision: A Systematic Literature Review

Shashank Bangalore Lakshman, Nasir U. Eisty

Internet of Things (IoT) has catapulted human ability to control our environments through ubiquitous sensing, communication, computation, and actuation. Over the past few years, IoT has joined forces with Machine Learning (ML) to embed deep intelligence at the far edge. TinyML (Tiny Machine Learning) has enabled the deployment of ML models for embedded vision on extremely lean edge hardware, bringing the power of IoT and ML together. However, TinyML powered embedded vision applications are still in a nascent stage, and they are just starting to scale to widespread real-world IoT deployment. To harness the true potential of IoT and ML, it is necessary to provide product developers with robust, easy-to-use software engineering (SE) frameworks and best practices that are customized for the unique challenges faced in TinyML engineering. Through this systematic literature review, we aggregated the key challenges reported by TinyML developers and identified state-of-art SE approaches in large-scale Computer Vision, Machine Learning, and Embedded Systems that can help address key challenges in TinyML based IoT embedded vision. In summary, our study draws synergies between SE expertise that embedded systems developers and ML developers have independently developed to help address the unique challenges in the engineering of TinyML based IoT embedded vision.

SESep 22, 2021
Developers Perception of Peer Code Review in Research Software Development

Nasir U. Eisty, Jeffrey C. Carver

Background: Research software is software developed by and/or used by researchers, across a wide variety of domains, to perform their research. Because of the complexity of research software, developers cannot conduct exhaustive testing. As a result, researchers have lower confidence in the correctness of the output of the software. Peer code review, a standard software engineering practice, has helped address this problem in other types of software. Aims: Peer code review is less prevalent in research software than it is in other types of software. In addition, the literature does not contain any studies about the use of peer code review in research software. Therefore, through analyzing developers perceptions, the goal of this work is to understand the current practice of peer code review in the development of research software, identify challenges and barriers associated with peer code review in research software, and present approaches to improve the peer code review in research software. Method: We conducted interviews and a community survey of research software developers to collect information about their current peer code review practices, difficulties they face, and how they address those difficulties. Results: We received 84 unique responses from the interviews and surveys. The results show that while research software teams review a large amount of their code, they lack formal process, proper organization, and adequate people to perform the reviews. Conclusions: Use of peer code review is promising for improving the quality of research software and thereby improving the trustworthiness of the underlying research results. In addition, by using peer code review, research software developers produce more readable and understandable code, which will be easier to maintain.