A Simple yet Brisk and Efficient Active Learning Platform for Text Classification
This provides a practical tool for business users to quickly deploy text classification models, though it appears incremental in combining existing methods.
The authors tackled the problem of enabling business users to build and deploy text classification models without extensive data science involvement by creating a fully managed active learning platform. Their approach achieved brisk and efficient labeling by combining GPT2 text representations with fast incremental linear models, validated on public and insurance datasets.
In this work, we propose the use of a fully managed machine learning service, which utilizes active learning to directly build models from unstructured data. With this tool, business users can quickly and easily build machine learning models and then directly deploy them into a production ready hosted environment without much involvement from data scientists. Our approach leverages state-of-the-art text representation like OpenAI's GPT2 and a fast implementation of the active learning workflow that relies on a simple construction of incremental learning using linear models, thus providing a brisk and efficient labeling experience for the users. Experiments on both publicly available and real-life insurance datasets empirically show why our choices of simple and fast classification algorithms are ideal for the task at hand.