Katsuro Inoue

SE
8papers
474citations
Novelty32%
AI Score41

8 Papers

SESep 14, 2017Code
On the Impact of Micro-Packages: An Empirical Study of the npm JavaScript Ecosystem

Raula Gaikovina Kula, Ali Ouni, Daniel M. German et al.

The rise of user-contributed Open Source Software (OSS) ecosystems demonstrate their prevalence in the software engineering discipline. Libraries work together by depending on each other across the ecosystem. From these ecosystems emerges a minimized library called a micro-package. Micro- packages become problematic when breaks in a critical ecosystem dependency ripples its effects to unsuspecting users. In this paper, we investigate the impact of micro-packages in the npm JavaScript ecosystem. Specifically, we conducted an empirical in- vestigation with 169,964 JavaScript npm packages to understand (i) the widespread phenomena of micro-packages, (ii) the size dependencies inherited by a micro-package and (iii) the developer usage cost (ie., fetch, install, load times) of using a micro-package. Results of the study find that micro-packages form a significant portion of the npm ecosystem. Apart from the ease of readability and comprehension, we show that some micro-packages have long dependency chains and incur just as much usage costs as other npm packages. We envision that this work motivates the need for developers to be aware of how sensitive their third-party dependencies are to critical changes in the software ecosystem.

SESep 14, 2017Code
Modeling Library Dependencies and Updates in Large Software Repository Universes

Raula Gaikovina Kula, Coen De Roover, Daniel M. German et al.

Popular (re)use of third-party open-source software (OSS) is evidence of the impact of hosting repositories like maven on software development today. Updating libraries is crucial, with recent studies highlighting the associated vulnerabilities with aging OSS libraries. The decision to migrate to a newer library can range from trivial (security threat) to complex (assessment of work required to accommodate the changes). By leveraging the `wisdom of the software repository crowd' we propose a simple and efficient approach to recommending `consented' library updates. Our Software Universe Graph (SUG) models library dependency and update information mined from super repositories to provide different metrics and visualizations that aid in the update decision. To evaluate, we first constructed a SUG from 188,951 nodes of 6,374 maven unique artifacts. Then, we demonstrate how our metrics and visualizations are applied through real-world examples. As an extension, we show how the SUG can compare dependencies between different super repositories. From a sample of 100 GitHub applications, our method found that on average 79% similar overlapping dependencies combinations exist between the maven and github super repository universes.

SEDec 6, 2016Code
Automated Inference of Software Library Usage Patterns

Mohamed Aymen Saied, Ali Ouni, Houari Sahraoui et al.

Modern software systems are increasingly dependent on third-party libraries. It is widely recognized that using mature and well-tested third-party libraries can improve developers' productivity, reduce time-to-market, and produce more reliable software. Today's open-source repositories provide a wide range of libraries that can be freely downloaded and used. However, as software libraries are documented separately but intended to be used together, developers are unlikely to fully take advantage of these reuse opportunities. In this paper, we present a novel approach to automatically identify third-party library usage patterns, i.e., collections of libraries that are commonly used together by developers. Our approach employs hierarchical clustering technique to group together software libraries based on external client usage. To evaluate our approach, we mined a large set of over 6,000 popular libraries from Maven Central Repository and investigated their usage by over 38,000 client systems from the Github repository. Our experiments show that our technique is able to detect the majority (77%) of highly consistent and cohesive library usage patterns across a considerable number of client systems.

8.1SEApr 5
Software Testing Beyond Closed Worlds: Open-World Games as an Extreme Case

Yusaku Kato, Norihiro Yoshida, Erina Makihara et al.

Software testing research has traditionally relied on closed-world assumptions, such as finite state spaces, reproducible executions, and stable test oracles. However, many modern software systems operate under uncertainty, non-determinism, and evolving conditions, challenging these assumptions. This paper uses open-world games as an extreme case to examine the limitations of closed-world testing. Through a set of observations grounded in prior work, we identify recurring characteristics that complicate testing in such systems, including inexhaustible behavior spaces, non-deterministic execution outcomes, elusive behavioral boundaries, and unstable test oracles. Based on these observations, we articulate a vision of software testing beyond closed-world assumptions, in which testing supports the characterization and interpretation of system behavior under uncertainty. We further discuss research directions for automated test generation, evaluation metrics, and empirical study design. Although open-world games serve as the motivating domain, the challenges and directions discussed in this paper extend to a broader class of software systems operating in dynamic and uncertain environments.

SEMar 12, 2020
Code Clone Matching: A Practical and Effective Approach to Find Code Snippets

Katsuro Inoue, Yuya Miyamoto, Daniel M. German et al.

Finding the same or similar code snippets in source code is one of fundamental activities in software maintenance. Text-based pattern matching tools such as grep is frequently used for such purpose, but making proper queries for the expected result is not easy. Code clone detectors could be used but their features and result are generally excessive. In this paper, we propose Code Clone matching (CC matching for short) that employs a combination of token-based clone detection and meta-patterns enhanced with meta-tokens. The user simply gives a query code snippet possibly with a few meta-tokens and then gets the resulting snippets, forming type 1, 2, or 3 code clone pairs between the query and result. By using a code snippet with meta-tokens as the query, the resulting matches are well controlled by the users. CC matching has been implemented as a practical and efficient tool named ccgrep, with grep-like user interface. The evaluation shows that ccgrep~ is a very effective to find various kinds of code snippets.

SESep 27, 2017
An Empirical Study on the Impact of Refactoring Activities on Evolving Client-Used APIs

Raula Gaikovina Kula, Ali Ouni, Daniel M. German et al.

Context: Refactoring is recognized as an effective practice to maintain evolving software systems. For software libraries, we study how library developers refactor their Application Programming Interfaces (APIs), especially when it impacts client users by breaking an API of the library. Objective: Our work aims to understand how clients that use a library API are affected by refactoring activities. We target popular libraries that potentially impact more library client users. Method: We distinguish between library APIs based on their client-usage (refereed to as client-used APIs) in order to understand the extent to which API breakages relate to refactorings. Our tool-based approach allows for a large-scale study across eight libraries (i.e., totaling 183 consecutive versions) with around 900 clients projects. Results: We find that library maintainers are less likely to break client-used API classes. Quantitatively, we find that refactoring activities break less than 37% of all client-used APIs. In a more qualitative analysis, we show two documented cases of where non-refactoring API breaking changes are motivated other maintenance issues (i.e., bug fix and new features) and involve more complex refactoring operations. Conclusion: Using our automated approach, we find that library developers are less likely to break APIs and tend to break client-used APIs when addressing these maintenance issues.

SESep 14, 2017
Do Developers Update Their Library Dependencies? An Empirical Study on the Impact of Security Advisories on Library Migration

Raula Gaikovina Kula, Daniel M. German, Ali Ouni et al.

Third-party library reuse has become common practice in contemporary software development, as it includes several benefits for developers. Library dependencies are constantly evolving, with newly added features and patches that fix bugs in older versions. To take full advantage of third-party reuse, developers should always keep up to date with the latest versions of their library dependencies. In this paper, we investigate the extent of which developers update their library dependencies. Specifically, we conducted an empirical study on library migration that covers over 4,600 GitHub software projects and 2,700 library dependencies. Results show that although many of these systems rely heavily on dependencies, 81.5% of the studied systems still keep their outdated dependencies. In the case of updating a vulnerable dependency, the study reveals that affected developers are not likely to respond to a security advisory. Surveying these developers, we find that 69% of the interviewees claim that they were unaware of their vulnerable dependencies. Furthermore, developers are not likely to prioritize library updates, citing it as extra effort and added responsibility. This study concludes that even though third-party reuse is commonplace, the practice of updating a dependency is not as common for many developers.

SEApr 27, 2017
Source File Set Search for Clone-and-Own Reuse Analysis

Takashi Ishio, Yusuke Sakaguchi, Kaoru Ito et al.

Clone-and-own approach is a natural way of source code reuse for software developers. To assess how known bugs and security vulnerabilities of a cloned component affect an application, developers and security analysts need to identify an original version of the component and understand how the cloned component is different from the original one. Although developers may record the original version information in a version control system and/or directory names, such information is often either unavailable or incomplete. In this research, we propose a code search method that takes as input a set of source files and extracts all the components including similar files from a software ecosystem (i.e., a collection of existing versions of software packages). Our method employs an efficient file similarity computation using b-bit minwise hashing technique. We use an aggregated file similarity for ranking components. To evaluate the effectiveness of this tool, we analyzed 75 cloned components in Firefox and Android source code. The tool took about two hours to report the original components from 10 million files in Debian GNU/Linux packages. Recall of the top-five components in the extracted lists is 0.907, while recall of a baseline using SHA-1 file hash is 0.773, according to the ground truth recorded in the source code repositories.