Approximating invariant functions with the sorting trick is theoretically justified
This work addresses a theoretical gap in machine learning for researchers and practitioners using group invariance, though it is incremental as it builds on existing canonicalization methods.
The paper tackles the problem of approximating invariant functions using canonicalization, such as sorting, by establishing an approximation theory that bounds point-wise and L^2 errors and eigenvalue decay rates for reproducing kernels. The result provides theoretical justification for canonicalization, which is computationally efficient but previously lacked performance analysis.
Many machine learning models leverage group invariance which is enjoyed with a wide-range of applications. For exploiting an invariance structure, one common approach is known as \emph{frame averaging}. One popular example of frame averaging is the \emph{group averaging}, where the entire group is used to symmetrize a function. Another example is the \emph{canonicalization}, where a frame at each point consists of a single group element which transforms the point to its orbit representative, for example, sorting. Compared to group averaging, canonicalization is more efficient computationally. However, it results in non-differentiablity or discontinuity of the canonicalized function. As a result, the theoretical performance of canonicalization has not been given much attention. In this work, we establish an approximation theory for canonicalization. Specifically, we bound the point-wise and $L^2(\mathbb{P})$ approximation errors as well as the eigenvalue decay rates associated with a canonicalization trick applied to reproducing kernels. We discuss two key insights from our theoretical analyses and why they point to an interesting future research direction on how one can choose a design to fully leverage canonicalization in practice.