Properties of Minimizing Entropy
This work addresses the challenge of designing better compactness measures for machine learning, but it appears incremental as it builds on existing concepts without introducing a new paradigm.
The paper tackles the problem of improving generalization through compact data representations by illustrating the relationship between entropy and cardinality, and proposes expected cardinality as a compromise measure, showing that minimizing entropy also minimizes expected cardinality.
Compact data representations are one approach for improving generalization of learned functions. We explicitly illustrate the relationship between entropy and cardinality, both measures of compactness, including how gradient descent on the former reduces the latter. Whereas entropy is distribution sensitive, cardinality is not. We propose a third compactness measure that is a compromise between the two: expected cardinality, or the expected number of unique states in any finite number of draws, which is more meaningful than standard cardinality as it discounts states with negligible probability mass. We show that minimizing entropy also minimizes expected cardinality.