Hitesh Sajnani

h-index16

4papers

714citations

Novelty43%

AI Score27

Ranked #155,291 of 194,257 authors (top 80%)#1,810 in SE (top 60%)

4 Papers

6.9SEJan 22, 2019Code

Towards Predicting the Impact of Software Changes on Building Activities

Michele Tufano, Hitesh Sajnani, Kim Herzig

The pervasive adoption of Continuous Integration practices -- both in industry and open source projects -- has led software building to become a daily activity for thousands of developers around the world. Companies such as Microsoft have invested in in-house infrastructures with the goal of optimizing the build process. CloudBuild, a distributed and caching build service developed internally by Microsoft, runs the build process in parallel in the cloud and relies on caching to accelerate builds. This allows for agile development and rapid delivery of software even several times a day. However, moving towards faster builds requires not only improvements on the infrastructure side, but also attention to developers' changes in the software. Surely, architectural decisions and software changes, such as addition of dependencies, can lead to significant build time increase. Yet, estimating the impact of such changes on build time can be challenging when dealing with complex, distributed, and cached build systems. In this paper, we envision a predictive model able to preemptively alert developers on the extent to which their software changes may impact future building activities. In particular, we describe an approach that analyzes the developer's change and predicts (i) whether it impacts (any of) the Longest Critical Path; (ii) may lead to build time increase and its delta; and (iii) the percentage of future builds that might be affected by such change.

10.4SEMar 8, 2021

A Case Study of Onboarding in Software Teams: Tasks and Strategies

An Ju, Hitesh Sajnani, Scot Kelly et al.

Developers frequently move into new teams or environments across software companies. Their onboarding experience is correlated with productivity, job satisfaction, and other short-term and long-term outcomes. The majority of the onboarding process comprises engineering tasks such as fixing bugs or implementing small features. Nevertheless, we do not have a systematic view of how tasks influence onboarding. In this paper, we present a case study of Microsoft, where we interviewed 32 developers moving into a new team and 15 engineering managers onboarding a new developer into their team -- to understand and characterize developers' onboarding experience and expectations in relation to the tasks performed by them while onboarding. We present how tasks interact with new developers through three representative themes: learning, confidence building, and socialization. We also discuss three onboarding strategies as inferred from the interviews that managers commonly use unknowingly, and discuss their pros and cons and offer situational recommendations. Furthermore, we triangulate our interview findings with a developer survey ($N=189$) and a manager survey ($N=37$) and find that survey results suggest that our findings are representative and our recommendations are actionable. Practitioners could use our findings to improve their onboarding processes, while researchers could find new research directions from this study to advance the understanding of developer onboarding. Our research instruments and anonymous data are available at \url{https://zenodo.org/record/4455937#.YCOQCs_0lFd}

11.2SEMar 5, 2016

SourcererCC and SourcererCC-I: Tools to Detect Clones in Batch mode and During Software Development

Vaibhav Saini, Hitesh Sajnani, Jaewoo Kim et al.

Given the availability of large source-code repositories, there has been a large number of applications for large-scale clone detection. Unfortunately, despite a decade of active research, there is a marked lack in clone detectors that scale to big software systems or large repositories, specifically for detecting near-miss (Type 3) clones where significant editing activities may take place in the cloned code. This paper demonstrates: (i) SourcererCC, a token-based clone detector that targets the first three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. It uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone, and (ii) SourcererCC-I, an Eclipse plug-in, that uses SourcererCC's core engine to identify and navigate clones (both inter and intra project) in real-time during software development. In our experiments, comparing SourcererCC with the state-of-the-art tools, we found that it is the only clone detection tool to successfully scale to 250 MLOC on a standard workstation with 12 GB RAM and efficiently detect the first three types of clones (precision 86% and recall 86-100%). Link to the demo: https://youtu.be/l7F_9Qp-ks4

33.7SEDec 20, 2015

SourcererCC: Scaling Code Clone Detection to Big Code

Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko et al.

Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.