LG CL MLJun 21, 2019

Meta-learning of textual representations

Jorge Madrid, Hugo Jair Escalante, Eduardo Morales

arXiv:1906.08934v23.410 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the limitation of existing AutoML methods that only handle tabular data, providing a step towards automation for non-experts in text mining tasks.

The paper tackles the problem of automating supervised learning for text mining by introducing a meta-learning methodology to automatically obtain textual representations from raw text, reporting experiments on over 80 datasets and 60 representations, showing it is a promising solution for effective off-the-shelf text classification pipelines.

Recent progress in AutoML has lead to state-of-the-art methods (e.g., AutoSKLearn) that can be readily used by non-experts to approach any supervised learning problem. Whereas these methods are quite effective, they are still limited in the sense that they work for tabular (matrix formatted) data only. This paper describes one step forward in trying to automate the design of supervised learning methods in the context of text mining. We introduce a meta learning methodology for automatically obtaining a representation for text mining tasks starting from raw text. We report experiments considering 60 different textual representations and more than 80 text mining datasets associated to a wide variety of tasks. Experimental results show the proposed methodology is a promising solution to obtain highly effective off the shell text classification pipelines.

View on arXiv PDF Code

Similar