On Deep Set Learning and the Choice of Aggregations
This work addresses a core architectural issue in set learning for researchers, but it is incremental as it focuses on optimizing existing methods rather than introducing a new paradigm.
The paper tackled the problem of sensitivity in Deep Set networks to aggregation functions, showing that learnable aggregations improve performance, reduce hyper-parameter sensitivity, and enhance generalization to out-of-distribution input sizes.
Recently, it has been shown that many functions on sets can be represented by sum decompositions. These decompositons easily lend themselves to neural approximations, extending the applicability of neural nets to set-valued inputs---Deep Set learning. This work investigates a core component of Deep Set architecture: aggregation functions. We suggest and examine alternatives to commonly used aggregation functions, including learnable recurrent aggregation functions. Empirically, we show that the Deep Set networks are highly sensitive to the choice of aggregation functions: beyond improved performance, we find that learnable aggregations lower hyper-parameter sensitivity and generalize better to out-of-distribution input size.