A Comparative Analysis of classification data mining techniques : Deriving key factors useful for predicting students performance
This addresses high failure rates in Indian engineering education by identifying predictive factors, but it is incremental as it applies standard data mining methods to a specific dataset.
The paper compared classification techniques like Naïve Bayes and JRip to predict student performance in engineering, finding Naïve Bayes most accurate for failure prediction and JRip for grade prediction, with JRip providing interpretable rules for key factors.
Students opting for Engineering as their discipline is increasing rapidly. But due to various factors and inappropriate primary education in India, failure rates are high. Students are unable to excel in core engineering because of complex and mathematical subjects. Hence, they fail in such subjects. With the help of data mining techniques, we can predict the performance of students in terms of grades and failure in subjects. This paper performs a comparative analysis of various classification techniques, such as Naïve Bayes, LibSVM, J48, Random Forest, and JRip and tries to choose best among these. Based on the results obtained, we found that Naïve Bayes is the most accurate method in terms of students failure prediction and JRip is most accurate in terms of students grade prediction. We also found that JRip marginally differs from Naïve Bayes in terms of accuracy for students failure prediction and gives us a set of rules from which we derive the key factors influencing students performance. Finally, we suggest various ways to mitigate these factors. This study is limited to Indian Education system scenarios. However, the factors found can be helpful in other scenarios as well.