ME LG MLFeb 23, 2021

A Goodness-of-fit Test on the Number of Biclusters in a Relational Data Matrix

arXiv:2102.11658v3

Originality Incremental advance

AI Analysis

This addresses a gap in biclustering analysis for relational data, offering a more flexible method for model selection, though it appears incremental as it builds on prior statistical tests.

The paper tackles the problem of determining the appropriate number of biclusters in relational data matrices, proposing a new statistical test that does not require a regular-grid assumption and demonstrating its effectiveness on synthetic and practical data.

Biclustering is a method for detecting homogeneous submatrices in a given observed matrix, and it is an effective tool for relational data analysis. Although there are many studies that estimate the underlying bicluster structure of a matrix, few have enabled us to determine the appropriate number of biclusters in an observed matrix. Recently, a statistical test on the number of biclusters has been proposed for a regular-grid bicluster structure, where we assume that the latent bicluster structure can be represented by row-column clustering. However, when the latent bicluster structure does not satisfy such regular-grid assumption, the previous test requires a larger number of biclusters than necessary (i.e., a finer bicluster structure than necessary) for the null hypothesis to be accepted, which is not desirable in terms of interpreting the accepted bicluster structure. In this study, we propose a new statistical test on the number of biclusters that does not require the regular-grid assumption and derive the asymptotic behavior of the proposed test statistic in both null and alternative cases. We illustrate the effectiveness of the proposed method by applying it to both synthetic and practical relational data matrices.

View on arXiv PDF

Similar