Learning the Dimensionality of Hidden Variables
This addresses a key challenge in probabilistic modeling for researchers and practitioners dealing with hidden variables, though it appears incremental as it builds on existing Bayesian network frameworks.
The paper tackles the problem of determining the number of states for hidden variables in Bayesian networks, proposing a score-based agglomerative state-clustering approach that efficiently evaluates models across a range of cardinalities and extends to multiple hidden variables, resulting in learned models that generalize better and have improved structure compared to previous methods.
A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. Detecting hidden variables poses two problems: determining the relations to other variables in the model and determining the number of states of the hidden variable. In this paper, we address the latter problem in the context of Bayesian networks. We describe an approach that utilizes a score-based agglomerative state-clustering. As we show, this approach allows us to efficiently evaluate models with a range of cardinalities for the hidden variable. We show how to extend this procedure to deal with multiple interacting hidden variables. We demonstrate the effectiveness of this approach by evaluating it on synthetic and real-life data. We show that our approach learns models with hidden variables that generalize better and have better structure than previous approaches.