Explainable Multi-Label Classification of MBTI Types
This work addresses the need for explainable classification of personality types from social media data, but it is incremental as it applies standard methods to a new dataset.
The study tackled the problem of classifying Myers-Briggs Type Indicator (MBTI) types from Reddit and Kaggle data using multi-label classification, finding that Multinomial Naive Bayes and k-Nearest Neighbor perform better when excluding Observer (S) traits, while Logistic Regression achieves best results with classes having over 550 entries.
In this study, we aim to identify the most effective machine learning model for accurately classifying Myers-Briggs Type Indicator (MBTI) types from Reddit posts and a Kaggle data set. We apply multi-label classification using the Binary Relevance method. We use Explainable Artificial Intelligence (XAI) approach to highlight the transparency and understandability of the process and result. To achieve this, we experiment with glass-box learning models, i.e. models designed for simplicity, transparency, and interpretability. We selected k-Nearest Neighbour, Multinomial Naive Bayes, and Logistic Regression for the glass-box models. We show that Multinomial Naive Bayes and k-Nearest Neighbour perform better if classes with Observer (S) traits are excluded, whereas Logistic Regression obtains its best results when all classes have > 550 entries.