Lov Kumar

SE
6papers
43citations
Novelty18%
AI Score17

6 Papers

SEApr 14, 2017Code
Using Source Code Metrics and Ensemble Methods for Fault Proneness Prediction

Lov Kumar, Santanu Rath, Ashish Sureka

Software fault prediction model are employed to optimize testing resource allocation by identifying fault-prone classes before testing phases. Several researchers' have validated the use of different classification techniques to develop predictive models for fault prediction. The performance of the statistical models are proven to be influenced by the training and testing dataset. Ensemble method learning algorithms have been widely used because it combines the capabilities of its constituent models towards a dataset to come up with a potentially higher performance as compared to individual models (improves generalizability). In the study presented in this paper, three different ensemble methods have been applied to develop a model for predicting fault proneness. The efficacy and usefulness of a fault prediction model also depends on the source code metrics which are considered as the input for the model. In this paper, we propose a framework to validate the source code metrics and select the right set of metrics with the objective to improve the performance of the fault prediction model. The fault prediction models are then validated using a cost evaluation framework. We conduct a series of experiments on 45 open source project dataset. Key conclusions from our experiments are: (1) Majority Voting Ensemble (MVE) methods outperformed other methods; (2) selected set of source code metrics using the suggested source code metrics using validation framework as the input achieves better results compared to all other metrics; (3) fault prediction method is effective for software projects with a percentage of faulty classes lower than the threshold value (low - 54.82%, medium - 41.04%, high - 28.10%)

SEAug 8, 2021
An Empirical Study on Predictability of Software Code Smell Using Deep Learning Models

Himanshu Gupta, Tanmay G. Kulkarni, Lov Kumar et al.

Code Smell, similar to a bad smell, is a surface indication of something tainted but in terms of software writing practices. This metric is an indication of a deeper problem lies within the code and is associated with an issue which is prominent to experienced software developers with acceptable coding practices. Recent studies have often observed that codes having code smells are often prone to a higher probability of change in the software development cycle. In this paper, we developed code smell prediction models with the help of features extracted from source code to predict eight types of code smell. Our work also presents the application of data sampling techniques to handle class imbalance problem and feature selection techniques to find relevant feature sets. Previous studies had made use of techniques such as Naive - Bayes and Random forest but had not explored deep learning methods to predict code smell. A total of 576 distinct Deep Learning models were trained using the features and datasets mentioned above. The study concluded that the deep learning models which used data from Synthetic Minority Oversampling Technique gave better results in terms of accuracy, AUC with the accuracy of some models improving from 88.47 to 96.84.

SEAug 8, 2021
Empirical Analysis on Effectiveness of NLP Methods for Predicting Code Smell

Himanshu Gupta, Abhiram Anand Gulanikar, Lov Kumar et al.

A code smell is a surface indicator of an inherent problem in the system, most often due to deviation from standard coding practices on the developers part during the development phase. Studies observe that code smells made the code more susceptible to call for modifications and corrections than code that did not contain code smells. Restructuring the code at the early stage of development saves the exponentially increasing amount of effort it would require to address the issues stemming from the presence of these code smells. Instead of using traditional features to detect code smells, we use user comments to manually construct features to predict code smells. We use three Extreme learning machine kernels over 629 packages to identify eight code smells by leveraging feature engineering aspects and using sampling techniques. Our findings indicate that the radial basis functional kernel performs best out of the three kernel methods with a mean accuracy of 98.52.

SEDec 21, 2017
A Comparative Study of Different Source Code Metrics and Machine Learning Algorithms for Predicting Change Proneness of Object Oriented Systems

Lov Kumar, Ashish Sureka

Change-prone classes or modules are defined as software components in the source code which are likely to change in the future. Change-proneness prediction is useful to the maintenance team as they can optimize and focus their testing resources on the modules which have a higher likelihood of change. Change-proneness prediction model can be built by using source code metrics as predictors or features within a machine learning classification framework. In this paper, twenty one source code metrics are computed to develop a statistical model for predicting change-proneness modules. Since the performance of the change-proneness model depends on the source code metrics, they are used as independent variables or predictors for the change-proneness model. Eleven different feature selection techniques (including the usage of all the $21$ proposed source code metrics described in the paper) are used to remove irrelevant features and select the best set of features. The effectiveness of the set of source code metrics are evaluated using eighteen different classiffication techniques and three ensemble techniques. Experimental results demonstrate that the model based on selected set of source code metrics after applying feature selection techniques achieves better results as compared to the model using all source code metrics as predictors. Our experimental results reveal that the predictive model developed using LSSVM-RBF yields better result as compared to other classification techniques

SEOct 30, 2016
A Bibliometric Study of Asia Pacific Software Engineering Conference from 2010 to 2015

Lov Kumar, Saikrishna Sripada, Ashish Sureka

The Asia-Pacific Software Engineering Conference (APSEC) is a reputed and a long-running conference which has successfully completed more than two decades as of year 2015. We conduct a bibliometric and scientific publication mining based study to how the conference has evolved over the recent past six years (year 2010 to 2015). Our objective is to perform in-depth examination of the state of APSEC so that the APSEC community can identify strengths, areas of improvements and future directions for the conference. Our empirical analysis is based on various perspectives such as: paper submission acceptance rate trends, conference location, scholarly productivity and contributions from various countries, analysis of keynotes, workshops, conference organizers and sponsors, tutorials, identification of prolific authors, computation of citation impact of papers and contributing authors, internal and external collaboration, university and industry participation and collaboration, measurement of gender imbalance, topical analysis, yearly author churn and program committee characteristics.

SESep 20, 2016
Thirteen Years of Mining Software Repositories (MSR) Conference - What is the Bibliography Data Telling Us?

Lov Kumar, Ashish Sureka

The Mining Software Repositories (MSR) conference is a reputed, long-running and flagship conference in the area of Software Analytics which has successfully completed more than one decade as of year 2016. We conduct a bibliometric and scientific publication mining based study to study how the conference has evolved over the recent past 13 years (from 2004 to 2007 as a workshop and then from 2008 to 2016 as a conference). Our objective is to perform an examination of the state of MSR so that the MSR community can identify strengths, areas of improvements and future directions for the conference.