Eric L. Melin

37.9SEMar 16Code

Self-Admitted Technical Debt in Scientific Software: Prioritization, Sentiment, and Propagation Across Artifacts

Eric L. Melin, Nasir U. Eisty, Gregory R. Watson et al.

Self-admitted technical debt (SATD) impairs scientific software (SSW), yet its prioritization, sentiment, persistence, and propagation remains underexplored. Understanding how SSW developers express, and address SATD is crucial for improving SSW maintenance, and tooling. This study investigates how SATD types and artifacts in SSW are prioritized, how sentiment relates to urgency, SATD removal and resolution rates, and the extent to which SATD propagates across artifacts. We analyzed nine SSW repositories using a SATD classification model and a semantic embedding-based prioritization heuristic. SATD was examined across multiple artifacts, with sentiment assessed via a fine-tuned transformer. Propagation was traced, priority scores compared to static analysis, and removal and resolution rates quantified. SATD in comments, commits, and pull requests receive higher priority than SATD in issues, with negative sentiment amplifying urgency. Resolution and removal rates lag behind open-source software (OSS) averages. Most SATD remains confined to the originating artifact, but longer propagation chains are rare and correlate with higher priority, highlighting persistent and high impact debt. Prioritization is influenced by artifact type and sentiment, while low removal and resolution rates signal persistent debt. Cross-artifact propagation marks high priority, unresolved SATD, providing empirical guidance for targeted monitoring, review prioritization, and tool supported maintenance in SSW.

73.1SEApr 3Code

Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models

Eric L. Melin, Adam J. Torek, Nasir U. Eisty et al.

Context: Large Language Models (LLMs) like GPT-5 and LLaMA-405b exhibit advanced code generation abilities, but their deployment demands substantial computation resources and energy. Quantization can reduce memory footprint and hardware requirements, yet may degrade code quality. Objective: This study investigates code generation performance of smaller LLMs, examines the effect of quantization, and identifies common code quality issues as a proof of concepts (PoC). Method: Four open-source LLMs are evaluated on Python benchmarks using code similarity metrics, with an analysis on 8-bit and 4-bit quantization, alongside static code quality assessment. Results: While smaller LLMs can generate functional code, benchmark performance is limited. Quantization impacts are variable, and generated code exhibits quality and maintainability concerns. Conclusions: LLM-generated code should be carefully validated before integration into software projects.

Eric L. Melin

2 Papers