Universal Approximation of Functions on Sets
This work addresses a theoretical limitation in set learning methods, relevant for researchers in machine learning, but is incremental as it builds on existing paradigms.
The paper analyzes the universal approximation property of Deep Sets for permutation-invariant functions, showing it requires a sufficiently high-dimensional latent space; otherwise, worst-case error can be as bad as a constant baseline for piecewise-affine functions.
Modelling functions of sets, or equivalently, permutation-invariant functions, is a long-standing challenge in machine learning. Deep Sets is a popular method which is known to be a universal approximator for continuous set functions. We provide a theoretical analysis of Deep Sets which shows that this universal approximation property is only guaranteed if the model's latent space is sufficiently high-dimensional. If the latent space is even one dimension lower than necessary, there exist piecewise-affine functions for which Deep Sets performs no better than a naïve constant baseline, as judged by worst-case error. Deep Sets may be viewed as the most efficient incarnation of the Janossy pooling paradigm. We identify this paradigm as encompassing most currently popular set-learning methods. Based on this connection, we discuss the implications of our results for set learning more broadly, and identify some open questions on the universality of Janossy pooling in general.