SEJan 24, 2022Code
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution: An Empirical StudyTatiana Castro Vélez, Raffi Khatchadourian, Mehdi Bagherzadeh et al.
Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code that supports symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce DL code that is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, less error-prone imperative DL frameworks encouraging eager execution have emerged but at the expense of run-time performance. While hybrid approaches aim for the "best of both worlds," the challenges in applying them in the real world are largely unknown. We conduct a data-driven analysis of challenges -- and resultant bugs -- involved in writing reliable yet performant imperative DL code by studying 250 open-source projects, consisting of 19.7 MLOC, along with 470 and 446 manually examined code patches and bug reports, respectively. The results indicate that hybridization: (i) is prone to API misuse, (ii) can result in performance degradation -- the opposite of its intention, and (iii) has limited application due to execution mode incompatibility. We put forth several recommendations, best practices, and anti-patterns for effectively hybridizing imperative DL code, potentially benefiting DL practitioners, API designers, tool developers, and educators.
SEDec 6, 2021Code
A Tool for Rejuvenating Feature Logging Levels via Git Histories and Degree of InterestYiming Tang, Allan Spektor, Raffi Khatchadourian et al.
Logging is a significant programming practice. Due to the highly transactional nature of modern software applications, massive amount of logs are generated every day, which may overwhelm developers. Logging information overload can be dangerous to software applications. Using log levels, developers can print the useful information while hiding the verbose logs during software runtime. As software evolves, the log levels of logging statements associated with the surrounding software feature implementation may also need to be altered. Maintaining log levels necessitates a significant amount of manual effort. In this paper, we demonstrate an automated approach that can rejuvenate feature log levels by matching the interest level of developers in the surrounding features. The approach is implemented as an open-source Eclipse plugin, using two external plug-ins (JGit and Mylyn). It was tested on 18 open-source Java projects consisting of ~3 million lines of code and ~4K log statements. Our tool successfully analyzes 99.22% of logging statements, increases log level distributions by ~20%, and increases the focus of logs in bug fix contexts ~83% of the time. For further details, interested readers can watch our demonstration video (https://www.youtube.com/watch?v=qIULoAXoDv4).
SEApr 15, 2021Code
Automated Evolution of Feature Logging Statement Levels Using Git Histories and Degree of InterestYiming Tang, Allan Spektor, Raffi Khatchadourian et al.
Logging -- used for system events and security breaches to describe more informational yet essential aspects of software features -- is pervasive. Given the high transactionality of today's software, logging effectiveness can be reduced by information overload. Log levels help alleviate this problem by correlating a priority to logs that can be later filtered. As software evolves, however, levels of logs documenting surrounding feature implementations may also require modification as features once deemed important may have decreased in urgency and vice-versa. We present an automated approach that assists developers in evolving levels of such (feature) logs. The approach, based on mining Git histories and manipulating a degree of interest (DOI) model, transforms source code to revitalize feature log levels based on the "interestingness" of the surrounding code. Built upon JGit and Mylyn, the approach is implemented as an Eclipse IDE plug-in and evaluated on 18 Java projects with $\sim$3 million lines of code and $\sim$4K log statements. Our tool successfully analyzes 99.22% of logging statements, increases log level distributions by $\sim$20%, and increases the focus of logs in bug fix contexts $\sim$83% of the time. Moreover, pull (patch) requests were integrated into large and popular open-source projects. The results indicate that the approach is promising in assisting developers in evolving feature log levels.
SEApr 7, 2025
Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph ExecutionRaffi Khatchadourian, Tatiana Castro Vélez, Mehdi Bagherzadeh et al.
Efficiency is essential to support ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code -- supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, imperative DL frameworks encouraging eager execution have emerged but at the expense of run-time performance. Though hybrid approaches aim for the "best of both worlds," using them effectively requires subtle considerations. Our key insight is that, while DL programs typically execute sequentially, hybridizing imperative DL code resembles parallelizing sequential code in traditional systems. Inspired by this, we present an automated refactoring approach that assists developers in determining which otherwise eagerly-executed imperative DL functions could be effectively and efficiently executed as graphs. The approach features novel static imperative tensor and side-effect analyses for Python. Due to its inherent dynamism, analyzing Python may be unsound; however, the conservative approach leverages a speculative (keyword-based) analysis for resolving difficult cases that informs developers of any assumptions made. The approach is: (i) implemented as a plug-in to the PyDev Eclipse IDE that integrates the WALA Ariadne analysis framework and (ii) evaluated on nineteen DL projects consisting of 132 KLOC. The results show that 326 of 766 candidate functions (42.56%) were refactorable, and an average relative speedup of 2.16x on performance tests was observed with negligible differences in model accuracy. The results indicate that the approach is useful in optimizing imperative DL code to its full potential.
SESep 7, 2021
Interests, Difficulties, Sentiments, and Tool Usages of Concurrency Developers: A Large-Scale Study on Stack OverflowMehdi Bagherzadeh, Syed Ahmed, Srilakshmi Sripathi et al.
Context: Software developers are increasingly facing the challenges of writing code that is not only concurrent but also correct. Objective: To help these developers, it is necessary to understand concurrency topics they are interested in, their difficulty in finding answers for questions in these topics, their sentiment for these topics, and how they use concurrency tools and techniques to guarantee correctness. Method: We conduct a large-scale study on the entirety of Stack Overflow to understand interests, difficulties, sentiment, and tool usages of concurrency developers. We discuss the implications of our findings for the practice, research, and education of concurrent software development, and investigate the relation of our findings with the findings of the previous work. Results: A few findings of our study are: (1) questions that concurrency developers ask can be grouped into a hierarchy with 27 concurrency topics under 8 major categories, (2) thread safety is among the most popular concurrency topics and client-server concurrency is among the least popular, (3) irreproducible behavior is among the most difficult topics and memory consistency is among the least difficult, (4) data scraping is among the most positive concurrency topics and irreproducible behavior is among the most negative, (5) root cause identification has the most number of questions for usage of data race tools and alternative use has the least. Conclusion: The results of our study can not only help concurrency developers but also concurrency educators and researchers to better decide where to focus their efforts, by trading off one concurrency topic against another.