MLLGFeb 13, 2019

Beyond the Chinese Restaurant and Pitman-Yor processes: Statistical Models with Double Power-law Behavior

arXiv:1902.04714v216 citations
Originality Incremental advance
AI Analysis

This work addresses the need for better statistical models in fields like natural language processing and network analysis where data exhibit complex two-regime power-law patterns, representing an incremental improvement over existing methods.

The authors tackled the problem of modeling datasets with double power-law behavior by introducing a new class of completely random measures, which outperform the Pitman-Yor process in fitting various datasets.

Bayesian nonparametric approaches, in particular the Pitman-Yor process and the associated two-parameter Chinese Restaurant process, have been successfully used in applications where the data exhibit a power-law behavior. Examples include natural language processing, natural images or networks. There is also growing empirical evidence that some datasets exhibit a two-regime power-law behavior: one regime for small frequencies, and a second regime, with a different exponent, for high frequencies. In this paper, we introduce a class of completely random measures which are doubly regularly-varying. Contrary to the Pitman-Yor process, we show that when completely random measures in this class are normalized to obtain random probability measures and associated random partitions, such partitions exhibit a double power-law behavior. We discuss in particular three models within this class: the beta prime process (Broderick et al. (2015, 2018), a novel process called generalized BFRY process, and a mixture construction. We derive efficient Markov chain Monte Carlo algorithms to estimate the parameters of these models. Finally, we show that the proposed models provide a better fit than the Pitman-Yor process on various datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes