ML CL LGMar 6, 2017

Generative and Discriminative Text Classification with Recurrent Neural Networks

Dani Yogatama, Chris Dyer, Wang Ling, Phil Blunsom

arXiv:1703.01898v221.4214 citations

Originality Incremental advance

AI Analysis

This work addresses text classification robustness for applications like zero-shot and continual learning, though it is incremental as it builds on prior theoretical findings.

The paper compares generative and discriminative LSTM models for text classification, finding that generative models have higher asymptotic error rates but converge faster and are more robust to distribution shifts, with substantial outperformance in zero-shot and continual learning settings.

We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find that generative models approach their asymptotic error rate more rapidly than their discriminative counterparts---the same pattern that Ng & Jordan (2001) proved holds for linear classification models that make more naive conditional independence assumptions. Building on this finding, we hypothesize that RNN-based generative classification models will be more robust to shifts in the data distribution. This hypothesis is confirmed in a series of experiments in zero-shot and continual learning settings that show that generative models substantially outperform discriminative models.

View on arXiv PDF

Similar