66.2SEMar 17Code
A Longitudinal Study of Usability in Identity-Based Software SigningKelechi G. Kalu, Hieu Tran, Santiago Torres-Arias et al.
Identity-based software signing tools aim to make software artifact provenance verifiable while reducing the operational burden of long-lived key management. However, there is limited cross-tool longitudinal evidence about which usability problems arise in practice and how those problems evolve as tools mature. This gap matters because unusable signing and verification workflows can lead to incomplete adoption, misconfiguration, or skipped verification, undermining intended integrity guarantees. We conducted the first mining-software-repositories study of five open-source identity-based signing ecosystems: Sigstore, OpenPubKey, HashiCorp Vault, Keyfactor, and Notary v2. We analyzed approximately 3,900 GitHub issues from Nov. 2021 to Nov. 2025. We coded each issue for the reported usability concern and the implicated architectural component, and compared patterns across tools and over time. Across ecosystems, reported concerns concentrate in verification workflows, policy and configuration surfaces, and integration boundaries. Longitudinal Poisson trend analysis shows substantial declines in reported issues for most ecosystems. However, across usability themes, workflow- and documentation-related concerns decline unevenly across tools and concern types, and verification workflows and configuration surfaces remain persistent friction points. These results indicate that identity-based signing reduces some usability burdens while relocating complexity to verification semantics, policy configuration, and deployment integration. Designing future signing ecosystems therefore requires treating verification semantics and release workflows as first-class usability targets rather than peripheral integration concerns.
CRAug 9, 2023
An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security FailuresTanmay Singla, Dharun Anandayuvaraj, Kelechi G. Kalu et al.
As we increasingly depend on software systems, the consequences of breaches in the software supply chain become more severe. High-profile cyber attacks like those on SolarWinds and ShadowHammer have resulted in significant financial and data losses, underlining the need for stronger cybersecurity. One way to prevent future breaches is by studying past failures. However, traditional methods of analyzing these failures require manually reading and summarizing reports about them. Automated support could reduce costs and allow analysis of more failures. Natural Language Processing (NLP) techniques such as Large Language Models (LLMs) could be leveraged to assist the analysis of failures. In this study, we assessed the ability of Large Language Models (LLMs) to analyze historical software supply chain breaches. We used LLMs to replicate the manual analysis of 69 software supply chain security failures performed by members of the Cloud Native Computing Foundation (CNCF). We developed prompts for LLMs to categorize these by four dimensions: type of compromise, intent, nature, and impact. GPT 3.5s categorizations had an average accuracy of 68% and Bard had an accuracy of 58% over these dimensions. We report that LLMs effectively characterize software supply chain failures when the source articles are detailed enough for consensus among manual analysts, but cannot yet replace human analysts. Future work can improve LLM performance in this context, and study a broader range of articles and failures.
SEJan 28
Operationalizing Research Software for Supply Chain SecurityKelechi G. Kalu, Soham Rattan, Taylor R. Schorlemmer et al.
Empirical studies of research software are hard to compare because the literature operationalizes ``research software'' inconsistently. Motivated by the research software supply chain (RSSC) and its security risks, we introduce an RSSC-oriented taxonomy that makes scope and operational boundaries explicit for empirical research software security studies. We conduct a targeted scoping review of recent repository mining and dataset construction studies, extracting each work's definition, inclusion criteria, unit of analysis, and identification heuristics. We synthesize these into a harmonized taxonomy and a mapping that translates prior approaches into shared taxonomy dimensions. We operationalize the taxonomy on a large community-curated corpus from the Research Software Encyclopedia (RSE), producing an annotated dataset, a labeling codebook, and a reproducible labeling pipeline. Finally, we apply OpenSSF Scorecard as a preliminary security analysis to show how repository-centric security signals differ across taxonomy-defined clusters and why taxonomy-aware stratification is necessary for interpreting RSSC security measurements.
14.2SEApr 14
Why Johnny Adopts Identity-Based Software Signing: A Usability Case Study of SigstoreKelechi G. Kalu, Sofia Okorafor, Tanmay Singla et al.
Software signing is the most robust method for ensuring the integrity and authenticity of components in a software supply chain. Legacy key-managed signing tools (e.g., OpenPGP) burdened practitioners with key management and signer identification, creating both usability challenges and security risks. A new class of identity-based signing tools automate many of these concerns, but little is known about their usability and its effect on their adoption and effectiveness in practice. A usability evaluation can clarify the extent to which identity-based designs succeed and highlight priorities for improvement. To fill this gap, we conducted the first usability study of Sigstore, a pioneering and widely adopted exemplar of identity-based signing. Through interviews with 17 industry experts, we examined (1) the problems and advantages associated with practitioners' tooling choices, (2) how and why their signing-tool usage has evolved over time, and (3) the contexts that cause usability concerns. Our findings illuminate the usability factors of identity-based signing tools and yield recommendations for toolmakers, adopting organizations, and the research community. Notably, components of identity-based tooling exhibit different levels of maturity and readiness for adoption, and integration flexibility is a common pain point but potentially mitigable through plugins and APIs. Our results will help identity-based signing toolmakers further strengthen software supply chain security.
SEDec 25, 2025
How Do Agents Perform Code Optimization? An Empirical StudyHuiyun Peng, Antonio Zhong, Ricardo Andrés Calvo Méndez et al.
Performance optimization is a critical yet challenging aspect of software development, often requiring a deep understanding of system behavior, algorithmic tradeoffs, and careful code modifications. Although recent advances in AI coding agents have accelerated code generation and bug fixing, little is known about how these agents perform on real-world performance optimization tasks. We present the first empirical study comparing agent- and human-authored performance optimization commits, analyzing 324 agent-generated and 83 human-authored PRs from the AIDev dataset across adoption, maintainability, optimization patterns, and validation practices. We find that AI-authored performance PRs are less likely to include explicit performance validation than human-authored PRs (45.7\% vs. 63.6\%, $p=0.007$). In addition, AI-authored PRs largely use the same optimization patterns as humans. We further discuss limitations and opportunities for advancing agentic code optimization.
LGDec 25, 2024
Recommending Pre-Trained Models for IoT DevicesParth V. Patil, Wenxin Jiang, Huiyun Peng et al.
The availability of pre-trained models (PTMs) has enabled faster deployment of machine learning across applications by reducing the need for extensive training. Techniques like quantization and distillation have further expanded PTM applicability to resource-constrained IoT hardware. Given the many PTM options for any given task, engineers often find it too costly to evaluate each model's suitability. Approaches such as LogME, LEEP, and ModelSpider help streamline model selection by estimating task relevance without exhaustive tuning. However, these methods largely leave hardware constraints as future work-a significant limitation in IoT settings. In this paper, we identify the limitations of current model recommendation approaches regarding hardware constraints and introduce a novel, hardware-aware method for PTM selection. We also propose a research agenda to guide the development of effective, hardware-conscious model recommendation systems for IoT applications.