Unsupervised Learning via Total Correlation Explanation
This addresses the problem of automating unsupervised learning for researchers and practitioners in fields like AI and data science, though it appears incremental as it builds on existing information-theoretic ideas.
The paper tackles the challenge of unsupervised learning by proposing the principle of Total Correlation Explanation (CorEx), which learns representations to explain dependence in data, and reports successes in diverse domains such as human behavior, biology, and language.
Learning by children and animals occurs effortlessly and largely without obvious supervision. Successes in automating supervised learning have not translated to the more ambiguous realm of unsupervised learning where goals and labels are not provided. Barlow (1961) suggested that the signal that brains leverage for unsupervised learning is dependence, or redundancy, in the sensory environment. Dependence can be characterized using the information-theoretic multivariate mutual information measure called total correlation. The principle of Total Cor-relation Ex-planation (CorEx) is to learn representations of data that "explain" as much dependence in the data as possible. We review some manifestations of this principle along with successes in unsupervised learning problems across diverse domains including human behavior, biology, and language.