LGJun 16, 2021

Comparison of Automated Machine Learning Tools for SMS Spam Message Filtering

arXiv:2106.08671v23.14 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This incremental work addresses spam filtering for mobile users by evaluating existing AutoML tools on a specific dataset.

The study compared three AutoML tools for SMS spam filtering, finding that ensemble models performed best, with H2O AutoML's Stacked Ensemble achieving a Log Loss of 0.8370 and improvements of 19.05% over TPOT and 5.56% over mljar-supervised.

Short Message Service (SMS) is a very popular service used for communication by mobile users. However, this popular service can be abused by executing illegal activities and influencing security risks. Nowadays, many automatic machine learning (AutoML) tools exist which can help domain experts and lay users to build high-quality ML models with little or no machine learning knowledge. In this work, a classification performance comparison was conducted between three automatic ML tools for SMS spam message filtering. These tools are mljar-supervised AutoML, H2O AutoML, and Tree-based Pipeline Optimization Tool (TPOT) AutoML. Experimental results showed that ensemble models achieved the best classification performance. The Stacked Ensemble model, which was built using H2O AutoML, achieved the best performance in terms of Log Loss (0.8370), true positive (1088/1116), and true negative (281/287) metrics. There is a 19.05\% improvement in Log Loss with respect to TPOT AutoML and 5.56\% improvement with respect to mljar-supervised AutoML. The satisfactory filtering performance achieved with AutoML tools provides a potential application for AutoML tools to automatically determine the best ML model that can perform best for SMS spam message filtering.

View on arXiv PDF Code

Similar