CLJul 20, 2016

Incremental Learning for Fully Unsupervised Word Segmentation Using Penalized Likelihood and Model Selection

arXiv:1607.05822v20.8

Originality Incremental advance

AI Analysis

This addresses the problem of word segmentation for natural language processing, but it is incremental as it builds on existing probabilistic and model selection methods.

The paper tackles unsupervised word segmentation by introducing an incremental learning approach that combines probabilistic modeling with model selection, achieving top-tier performance in both phonemic and orthographic segmentation.

We present a novel incremental learning approach for unsupervised word segmentation that combines features from probabilistic modeling and model selection. This includes super-additive penalties for addressing the cognitive burden imposed by long word formation, and new model selection criteria based on higher-order generative assumptions. Our approach is fully unsupervised; it relies on a small number of parameters that permits flexible modeling and a mechanism that automatically learns parameters from the data. Through experimentation, we show that this intricate design has led to top-tier performance in both phonemic and orthographic word segmentation.

View on arXiv PDF

Similar