37.9SEMay 21Code
Why Are Agentic Pull Requests Merged or Rejected? An Empirical StudySien Reeve O. Peralta, Fumika Hoshi, Hironori Washizaki et al.
AI coding agents increasingly submit pull requests (Agentic-PRs) to open-source repositories, yet their performance is commonly assessed using merge and rejection outcomes alone. We hypothesized that these outcome labels do not reliably reflect agent capability without considering review interactions. To test this, we conducted a decision-oriented analysis of 11,048 closed Agentic Pull Requests, refined to 9,799 human-reviewed PRs, and manually inspected 717 representative cases to recover decision rationale from interaction artifacts. We found that rejection outcomes substantially overstate agent error: only 35.7% of rejected PRs reflected clear agentic failures, while 31.2% were driven by workflow constraints and 33.1% lacked observable decision rationale. Among merged PRs, 15.4% required explicit reviewer involvement through feedback or direct commits, and 5.5% showed no visible interaction trace. We further observed systematic differences across agents, with Copilot and Devin more often embedded in reviewer-mediated workflows, while Codex and Cursor PRs were typically merged with minimal interaction. These results reject the assumption that PR outcomes alone capture agent performance and demonstrate the need for interaction-aware evaluation grounded in review behavior.
SEMar 24, 2023
An investigation of licensing of datasets for machine learning based on the GQM modelJunyu Chen, Norihiro Yoshida, Hiroaki Takada
Dataset licensing is currently an issue in the development of machine learning systems. And in the development of machine learning systems, the most widely used are publicly available datasets. However, since the images in the publicly available dataset are mainly obtained from the Internet, some images are not commercially available. Furthermore, developers of machine learning systems do not often care about the license of the dataset when training machine learning models with it. In summary, the licensing of datasets for machine learning systems is in a state of incompleteness in all aspects at this stage. Our investigation of two collection datasets revealed that most of the current datasets lacked licenses, and the lack of licenses made it impossible to determine the commercial availability of the datasets. Therefore, we decided to take a more scientific and systematic approach to investigate the licensing of datasets and the licensing of machine learning systems that use the dataset to make it easier and more compliant for future developers of machine learning systems.
8.1SEApr 5
Software Testing Beyond Closed Worlds: Open-World Games as an Extreme CaseYusaku Kato, Norihiro Yoshida, Erina Makihara et al.
Software testing research has traditionally relied on closed-world assumptions, such as finite state spaces, reproducible executions, and stable test oracles. However, many modern software systems operate under uncertainty, non-determinism, and evolving conditions, challenging these assumptions. This paper uses open-world games as an extreme case to examine the limitations of closed-world testing. Through a set of observations grounded in prior work, we identify recurring characteristics that complicate testing in such systems, including inexhaustible behavior spaces, non-deterministic execution outcomes, elusive behavioral boundaries, and unstable test oracles. Based on these observations, we articulate a vision of software testing beyond closed-world assumptions, in which testing supports the characterization and interpretation of system behavior under uncertainty. We further discuss research directions for automated test generation, evaluation metrics, and empirical study design. Although open-world games serve as the motivating domain, the challenges and directions discussed in this paper extend to a broader class of software systems operating in dynamic and uncertain environments.
CROct 20, 2021
On the Effectiveness of Clone Detection for Detecting IoT-related Vulnerable ClonesKentaro Ohno, Norihiro Yoshida, Wenqing Zhu et al.
Since IoT systems provide services over the Internet, they must continue to operate safely even if malicious users attack them. Since the computational resources of edge devices connected to the IoT are limited, lightweight platforms and network protocols are often used. Lightweight platforms and network protocols are less resistant to attacks, increasing the risk that developers will embed vulnerabilities. The code clone research community has been developing approaches to fix buggy (e.g., vulnerable) clones simultaneously. However, there has been little research on IoT-related vulnerable clones. It is unclear whether existing code clone detection techniques can perform simultaneous fixes of the vulnerable clones. In this study, we first created two datasets of IoT-related vulnerable code. We then conducted a preliminary investigation to show whether existing code clone detection tools (e.g., NiCaD, CCFinderSW) are capable of detecting IoT-related vulnerable clones by applying them to the created datasets. The preliminary result shows that the existing tools can detect them partially.
SEAug 7, 2018
A Survey of Refactoring Detection Techniques Based on Change History AnalysisEunjong Choi, Kenji Fujiwara, Norihiro Yoshida et al.
Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. Not only researchers, but also practitioners, need to know about past refactoring instances performed in a software development project. So far, a number of techniques have been proposed for automatic detection of refactoring instances. Those techniques have been presented in various international conferences and journals, however, it is difficult for researchers and practitioners to grasp the current status of studies on refactoring detection techniques. In this survey paper, we review various refactoring detection techniques, especially techniques based on change history analysis. First, we give the definition and categorization of refactoring detection methods in this paper, and then introduce refactoring detection techniques based on change history analysis. Finally, we discuss possible future research directions for refactoring detection.