Carlos Esteve

h-index7

3papers

51citations

Novelty48%

AI Score45

Ranked #41,079 of 194,257 authors (top 21%)#9,562 in LG (top 24%)

3 Papers

8.1SPMar 30

Sample Complexity Analysis of Multi-Target Detection via Markovian and Hard-Core Multi-Reference Alignment

Kweku Abraham, Amnon Balanov, Tamir Bendory et al.

Motivated by single-particle cryo-electron microscopy, we study the sample complexity of the multi-target detection (MTD) problem, in which an unknown signal appears multiple times at unknown locations within a long, noisy observation. We propose a patching scheme that reduces MTD to a non-i.i.d. multi-reference alignment (MRA) model. In the one-dimensional setting, the latent group elements form a Markov chain, and we show that the convergence rate of any estimator matches that of the corresponding i.i.d. MRA model, up to a logarithmic factor in the number of patches. Moreover, for estimators based on empirical averaging, such as the method of moments, the convergence rates are identical in both settings. We further establish an analogous result in two dimensions, where the latent structure arises from an exponentially mixing random field generated by a hard-core placement model. As a consequence, if the signal in the corresponding i.i.d. MRA model is determined by moments up to order $n_{\min}$, then in the low-SNR regime the number of patches required to estimate the signal in the MTD model scales as $Ï^{2n_{\min}}$, where $Ï^2$ denotes the noise variance.

8.4LGFeb 26, 2021Code

Sparsity in long-time control of neural ODEs

Carlos Esteve-Yagüe, Borjan Geshkovski

We consider the neural ODE and optimal control perspective of supervised learning, with $\ell^1$-control penalties, where rather than only minimizing a final cost (the \emph{empirical risk}) for the state, we integrate this cost over the entire time horizon. We prove that any optimal control (for this cost) vanishes beyond some positive stopping time. When seen in the discrete-time context, this result entails an \emph{ordered} sparsity pattern for the parameters of the associated residual neural network: ordered in the sense that these parameters are all $0$ beyond a certain layer. Furthermore, we provide a polynomial stability estimate for the empirical risk with respect to the time horizon. This can be seen as a \emph{turnpike property}, for nonsmooth dynamics and functionals with $\ell^1$-penalties, and without any smallness assumptions on the data, both of which are new in the literature.

15.8OCAug 6, 2020Code

Large-time asymptotics in deep learning

Carlos Esteve, Borjan Geshkovski, Dario Pighin et al.

We consider the neural ODE perspective of supervised learning and study the impact of the final time $T$ (which may indicate the depth of a corresponding ResNet) in training. For the classical $L^2$--regularized empirical risk minimization problem, whenever the neural ODE dynamics are homogeneous with respect to the parameters, we show that the training error is at most of the order $\mathcal{O}\left(\frac{1}{T}\right)$. Furthermore, if the loss inducing the empirical risk attains its minimum, the optimal parameters converge to minimal $L^2$--norm parameters which interpolate the dataset. By a natural scaling between $T$ and the regularization hyperparameter $λ$ we obtain the same results when $λ\searrow0$ and $T$ is fixed. This allows us to stipulate generalization properties in the overparametrized regime, now seen from the large depth, neural ODE perspective. To enhance the polynomial decay, inspired by turnpike theory in optimal control, we propose a learning problem with an additional integral regularization term of the neural ODE trajectory over $[0,T]$. In the setting of $\ell^p$--distance losses, we prove that both the training error and the optimal parameters are at most of the order $\mathcal{O}\left(e^{-μt}\right)$ in any $t\in[0,T]$. The aforementioned stability estimates are also shown for continuous space-time neural networks, taking the form of nonlinear integro-differential equations. By using a time-dependent moving grid for discretizing the spatial variable, we demonstrate that these equations provide a framework for addressing ResNets with variable widths.