On the Benefits of Active Data Collection in Operator Learning
This work addresses a foundational issue in machine learning by demonstrating the superiority of active over passive data collection for operator learning, which could impact methods relying on such operators.
The paper tackles the problem of data collection strategies in operator learning, showing that active strategies can achieve arbitrarily fast error convergence rates depending on eigenvalue decay, while passive strategies have a non-vanishing lower bound and never exceed linear decay.
We study active data collection strategies for operator learning when the target operator is linear and the input functions are drawn from a mean-zero stochastic process with continuous covariance kernels. With an active data collection strategy, we establish an error convergence rate in terms of the decay rate of the eigenvalues of the covariance kernel. We can achieve arbitrarily fast error convergence rates with sufficiently rapid eigenvalue decay of the covariance kernels. This contrasts with the passive (i.i.d.) data collection strategies, where the convergence rate is never faster than linear decay ($\sim n^{-1}$). In fact, for our setting, we show a \emph{non-vanishing} lower bound for any passive data collection strategy, regardless of the eigenvalues decay rate of the covariance kernel. Overall, our results show the benefit of active data collection strategies in operator learning over their passive counterparts.