LGAICVNEPFSep 26, 2022

Prayatul Matrix: A Direct Comparison Approach to Evaluate Performance of Supervised Machine Learning Models

arXiv:2209.12728v1h-index: 7
Originality Incremental advance
AI Analysis

This work addresses the need for more granular performance evaluation in machine learning, offering a novel method for researchers and practitioners to compare models on a per-instance basis, though it is incremental as it builds on existing evaluation frameworks.

The paper tackles the problem of comparing supervised machine learning models by proposing a direct comparison approach using individual instances instead of aggregate scores, introducing the Prayatul Matrix and five new performance measures, and validating them on classification techniques and deep learning models across multiple datasets, showing that these measures provide more insights than traditional confusion matrix-based scores.

Performance comparison of supervised machine learning (ML) models are widely done in terms of different confusion matrix based scores obtained on test datasets. However, a dataset comprises several instances having different difficulty levels. Therefore, it is more logical to compare effectiveness of ML models on individual instances instead of comparing scores obtained for the entire dataset. In this paper, an alternative approach is proposed for direct comparison of supervised ML models in terms of individual instances within the dataset. A direct comparison matrix called \emph{Prayatul Matrix} is introduced, which accounts for comparative outcome of two ML algorithms on different instances of a dataset. Five different performance measures are designed based on prayatul matrix. Efficacy of the proposed approach as well as designed measures is analyzed with four classification techniques on three datasets. Also analyzed on four large-scale complex image datasets with four deep learning models namely ResNet50V2, MobileNetV2, EfficientNet, and XceptionNet. Results are evident that the newly designed measure are capable of giving more insight about the comparing ML algorithms, which were impossible with existing confusion matrix based scores like accuracy, precision and recall.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes