SpaML: a Bimodal Ensemble Learning Spam Detector based on NLP Techniques
This work addresses spam detection, a domain-specific problem, but appears incremental as it builds on existing NLP methods and ensemble strategies without introducing a new paradigm.
The authors tackled spam detection by developing SpaML, a bimodal ensemble learning tool that combines supervised and unsupervised classifiers with NLP techniques (BoW and TF-IDF), achieving interesting results in accuracy and precision.
In this paper, we put forward a new tool, called SpaML, for spam detection using a set of supervised and unsupervised classifiers, and two techniques imbued with Natural Language Processing (NLP), namely Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF). We first present the NLP techniques used. Then, we present our classifiers and their performance on each of these techniques. Then, we present our overall Ensemble Learning classifier and the strategy we are using to combine them. Finally, we present the interesting results shown by SpaML in terms of accuracy and precision.