Identification of Probabilities
This addresses a foundational problem in psychology, neuroscience, and AI regarding the feasibility of probabilistic inference from limited data, with implications for understanding brain function and machine learning, though it is theoretical and incremental in building on existing mathematical tools.
The paper tackles the fundamental question of whether it is possible in principle to infer a probabilistic model from a finite sample, given computational and data constraints, and finds positive results: it specifies algorithms that can almost surely identify probability distributions in the limit for broad classes of computable distributions and Markov chains, and for dependent sequences, it identifies computable measures for which the sequence is typical.
Within psychology, neuroscience and artificial intelligence, there has been increasing interest in the proposal that the brain builds probabilistic models of sensory and linguistic input: that is, to infer a probabilistic model from a sample. The practical problems of such inference are substantial: the brain has limited data and restricted computational resources. But there is a more fundamental question: is the problem of inferring a probabilistic model from a sample possible even in principle? We explore this question and find some surprisingly positive and general results. First, for a broad class of probability distributions characterised by computability restrictions, we specify a learning algorithm that will almost surely identify a probability distribution in the limit given a finite i.i.d. sample of sufficient but unknown length. This is similarly shown to hold for sequences generated by a broad class of Markov chains, subject to computability assumptions. The technical tool is the strong law of large numbers. Second, for a large class of dependent sequences, we specify an algorithm which identifies in the limit a computable measure for which the sequence is typical, in the sense of Martin-Lof (there may be more than one such measure). The technical tool is the theory of Kolmogorov complexity. We analyse the associated predictions in both cases. We also briefly consider special cases, including language learning, and wider theoretical implications for psychology.