CL LG MLSep 28, 2017

Structured Embedding Models for Grouped Data

Maja Rudolph, Francisco Ruiz, Susan Athey, David Blei

arXiv:1709.10367v15.437 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the need for group-specific analysis in embedding models for researchers and practitioners in fields like political science, text mining, and retail analytics, representing an incremental advancement over prior exponential family embeddings.

The authors tackled the problem of discovering embeddings that vary across related groups of data, such as word usage in political speeches or shopping patterns across seasons, by developing structured exponential family embeddings (S-EFE) with sharing strategies like hierarchical modeling and amortization, resulting in improved group-specific interpretation and outperforming existing methods in predicting held-out data.

Word embeddings are a powerful approach for analyzing language, and exponential family embeddings (EFE) extend them to other types of data. Here we develop structured exponential family embeddings (S-EFE), a method for discovering embeddings that vary across related groups of data. We study how the word usage of U.S. Congressional speeches varies across states and party affiliation, how words are used differently across sections of the ArXiv, and how the co-purchase patterns of groceries can vary across seasons. Key to the success of our method is that the groups share statistical information. We develop two sharing strategies: hierarchical modeling and amortization. We demonstrate the benefits of this approach in empirical studies of speeches, abstracts, and shopping baskets. We show how S-EFE enables group-specific interpretation of word usage, and outperforms EFE in predicting held-out data.

View on arXiv PDF Code

Similar