Ruey-Cheng Chen

0.8CLJul 20, 2016

Incremental Learning for Fully Unsupervised Word Segmentation Using Penalized Likelihood and Model Selection

Ruey-Cheng Chen

We present a novel incremental learning approach for unsupervised word segmentation that combines features from probabilistic modeling and model selection. This includes super-additive penalties for addressing the cognitive burden imposed by long word formation, and new model selection criteria based on higher-order generative assumptions. Our approach is fully unsupervised; it relies on a small number of parameters that permits flexible modeling and a mechanism that automatically learns parameters from the data. Through experimentation, we show that this intricate design has led to top-tier performance in both phonemic and orthographic word segmentation.

1.6CLJul 20, 2016

An Adaptation of Topic Modeling to Sentences

Ruey-Cheng Chen, Reid Swanson, Andrew S. Gordon

Advances in topic modeling have yielded effective methods for characterizing the latent semantics of textual data. However, applying standard topic modeling approaches to sentence-level tasks introduces a number of challenges. In this paper, we adapt the approach of latent-Dirichlet allocation to include an additional layer for incorporating information about the sentence boundaries in documents. We show that the addition of this minimal information of document structure improves the perplexity results of a trained model.

Ruey-Cheng Chen

2 Papers