LG AI CLOct 23, 2023

Meta learning with language models: Challenges and opportunities in the classification of imbalanced text

Apostol Vassilev, Honglan Jin, Munawar Hasan

arXiv:2310.15019v23.81 citationsh-index: 17Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of imbalanced text classification for content moderation, but it appears incremental as it builds on existing meta learning and threshold-moving methods.

The paper tackles the challenge of detecting out-of-policy speech (OOPS) content by proposing a meta learning technique combined with threshold-moving to improve performance on imbalanced datasets, showing statistically significant advantages.

Detecting out of policy speech (OOPS) content is important but difficult. While machine learning is a powerful tool to tackle this challenging task, it is hard to break the performance ceiling due to factors like quantity and quality limitations on training data and inconsistencies in OOPS definition and data labeling. To realize the full potential of available limited resources, we propose a meta learning technique (MLT) that combines individual models built with different text representations. We analytically show that the resulting technique is numerically stable and produces reasonable combining weights. We combine the MLT with a threshold-moving (TM) technique to further improve the performance of the combined predictor on highly-imbalanced in-distribution and out-of-distribution datasets. We also provide computational results to show the statistically significant advantages of the proposed MLT approach. All authors contributed equally to this work.

View on arXiv PDF Code

Similar