LG MLMar 4, 2020

Contrastive estimation reveals topic posterior information to linear models

Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu

arXiv:2003.02234v122.167 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of semi-supervised document classification for researchers and practitioners by providing a method to improve performance with limited labeled data, though it is incremental as it builds on existing contrastive learning and topic modeling techniques.

The paper tackled the problem of document classification under topic modeling assumptions by proving that contrastive learning can recover representations revealing topic posterior information to linear models, and empirically demonstrated that linear classifiers with these representations perform well with very few training examples, achieving competitive results on benchmark datasets.

Contrastive learning is an approach to representation learning that utilizes naturally occurring similar and dissimilar pairs of data points to find useful embeddings of data. In the context of document classification under topic modeling assumptions, we prove that contrastive learning is capable of recovering a representation of documents that reveals their underlying topic posterior information to linear models. We apply this procedure in a semi-supervised setup and demonstrate empirically that linear classifiers with these representations perform well in document classification tasks with very few training examples.

View on arXiv PDF

Similar