ML LGFeb 13, 2019

Beyond the Chinese Restaurant and Pitman-Yor processes: Statistical Models with Double Power-law Behavior

arXiv:1902.04714v24.916 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the need for better statistical models in fields like natural language processing and network analysis where data exhibit complex two-regime power-law patterns, representing an incremental improvement over existing methods.

The authors tackled the problem of modeling datasets with double power-law behavior by introducing a new class of completely random measures, which outperform the Pitman-Yor process in fitting various datasets.

Bayesian nonparametric approaches, in particular the Pitman-Yor process and the associated two-parameter Chinese Restaurant process, have been successfully used in applications where the data exhibit a power-law behavior. Examples include natural language processing, natural images or networks. There is also growing empirical evidence that some datasets exhibit a two-regime power-law behavior: one regime for small frequencies, and a second regime, with a different exponent, for high frequencies. In this paper, we introduce a class of completely random measures which are doubly regularly-varying. Contrary to the Pitman-Yor process, we show that when completely random measures in this class are normalized to obtain random probability measures and associated random partitions, such partitions exhibit a double power-law behavior. We discuss in particular three models within this class: the beta prime process (Broderick et al. (2015, 2018), a novel process called generalized BFRY process, and a mixture construction. We derive efficient Markov chain Monte Carlo algorithms to estimate the parameters of these models. Finally, we show that the proposed models provide a better fit than the Pitman-Yor process on various datasets.

View on arXiv PDF Code

Similar