LGJul 13, 2023
Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networksJohan S. Wind, Vegard Antun, Anders C. Hansen
Understanding the implicit regularization imposed by neural network architectures and gradient based optimization methods is a key challenge in deep learning and AI. In this work we provide sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks (DLNs) in the over-parameterized regression setting and, potentially surprisingly, link this to the phenomenon of phase transitions in generalized hardness of approximation (GHA). GHA generalizes the phenomenon of hardness of approximation from computer science to, among others, continuous and robust optimization. It is well-known that the $\ell^1$-norm of the gradient flow of DLNs with tiny initialization converges to the objective function of basis pursuit. We improve upon these results by showing that the gradient flow of DLNs with tiny initialization approximates minimizers of the basis pursuit optimization problem (as opposed to just the objective function), and we obtain new and sharp convergence bounds w.r.t.\ the initialization size. Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem -- which is a contradiction -- thus implying sharpness. Moreover, we characterize $\textit{which}$ $\ell_1$ minimizer of the basis pursuit problem is chosen by the gradient flow whenever the minimizer is not unique. Interestingly, this depends on the depth of the DLN.
LGAug 14, 2024
Achieving Data Efficient Neural Networks with Hybrid Concept-based ModelsTobias A. Opsahl, Vegard Antun
Most datasets used for supervised machine learning consist of a single label per data point. However, in cases where more information than just the class label is available, would it be possible to train models more efficiently? We introduce two novel model architectures, which we call hybrid concept-based models, that train using both class labels and additional information in the dataset referred to as concepts. In order to thoroughly assess their performance, we introduce ConceptShapes, an open and flexible class of datasets with concept labels. We show that the hybrid concept-based models outperform standard computer vision models and previously proposed concept-based models with respect to accuracy, especially in sparse data settings. We also introduce an algorithm for performing adversarial concept attacks, where an image is perturbed in a way that does not change a concept-based model's concept predictions, but changes the class prediction. The existence of such adversarial examples raises questions about the interpretable qualities promised by concept-based models.
LGJan 20, 2021
Can stable and accurate neural networks be computed? -- On the barriers of deep learning and Smale's 18th problemMatthew J. Colbrook, Vegard Antun, Anders C. Hansen
Deep learning (DL) has had unprecedented success and is now entering scientific computing with full force. However, current DL methods typically suffer from instability, even when universal approximation properties guarantee the existence of stable neural networks (NNs). We address this paradox by demonstrating basic well-conditioned problems in scientific computing where one can prove the existence of NNs with great approximation qualities, however, there does not exist any algorithm, even randomised, that can train (or compute) such a NN. For any positive integers $K > 2$ and $L$, there are cases where simultaneously: (a) no randomised training algorithm can compute a NN correct to $K$ digits with probability greater than $1/2$, (b) there exists a deterministic training algorithm that computes a NN with $K-1$ correct digits, but any such (even randomised) algorithm needs arbitrarily many training data, (c) there exists a deterministic training algorithm that computes a NN with $K-2$ correct digits using no more than $L$ training samples. These results imply a classification theory describing conditions under which (stable) NNs with a given accuracy can be computed by an algorithm. We begin this theory by establishing sufficient conditions for the existence of algorithms that compute stable NNs in inverse problems. We introduce Fast Iterative REstarted NETworks (FIRENETs), which we both prove and numerically verify are stable. Moreover, we prove that only $\mathcal{O}(|\log(ε)|)$ layers are needed for an $ε$-accurate solution to the inverse problem.
LGJan 5, 2020
The troublesome kernel -- On hallucinations, no free lunches and the accuracy-stability trade-off in inverse problemsNina M. Gottschling, Vegard Antun, Anders C. Hansen et al.
Methods inspired by Artificial Intelligence (AI) are starting to fundamentally change computational science and engineering through breakthrough performances on challenging problems. However, reliability and trustworthiness of such techniques is a major concern. In inverse problems in imaging, the focus of this paper, there is increasing empirical evidence that methods may suffer from hallucinations, i.e., false, but realistic-looking artifacts; instability, i.e., sensitivity to perturbations in the data; and unpredictable generalization, i.e., excellent performance on some images, but significant deterioration on others. This paper provides a theoretical foundation for these phenomena. We give mathematical explanations for how and when such effects arise in arbitrary reconstruction methods, with several of our results taking the form of `no free lunch' theorems. Specifically, we show that (i) methods that overperform on a single image can wrongly transfer details from one image to another, creating a hallucination, (ii) methods that overperform on two or more images can hallucinate or be unstable, (iii) optimizing the accuracy-stability trade-off is generally difficult, (iv) hallucinations and instabilities, if they occur, are not rare events, and may be encouraged by standard training, (v) it may be impossible to construct optimal reconstruction maps for certain problems. Our results trace these effects to the kernel of the forward operator whenever it is nontrivial, but also apply to the case when the forward operator is ill-conditioned. Based on these insights, our work aims to spur research into new ways to develop robust and reliable AI-based methods for inverse problems in imaging.
MLJun 4, 2019
What do AI algorithms actually learn? - On false structures in deep learningLaura Thesing, Vegard Antun, Anders C. Hansen
There are two big unsolved mathematical questions in artificial intelligence (AI): (1) Why is deep learning so successful in classification problems and (2) why are neural nets based on deep learning at the same time universally unstable, where the instabilities make the networks vulnerable to adversarial attacks. We present a solution to these questions that can be summed up in two words; false structures. Indeed, deep learning does not learn the original structures that humans use when recognising images (cats have whiskers, paws, fur, pointy ears, etc), but rather different false structures that correlate with the original structure and hence yield the success. However, the false structure, unlike the original structure, is unstable. The false structure is simpler than the original structure, hence easier to learn with less data and the numerical algorithm used in the training will more easily converge to the neural network that captures the false structure. We formally define the concept of false structures and formulate the solution as a conjecture. Given that trained neural networks always are computed with approximations, this conjecture can only be established through a combination of theoretical and computational results similar to how one establishes a postulate in theoretical physics (e.g. the speed of light is constant). Establishing the conjecture fully will require a vast research program characterising the false structures. We provide the foundations for such a program establishing the existence of the false structures in practice. Finally, we discuss the far reaching consequences the existence of the false structures has on state-of-the-art AI and Smale's 18th problem.
CVFeb 14, 2019
On instabilities of deep learning in image reconstruction - Does AI come at a cost?Vegard Antun, Francesco Renna, Clarice Poon et al.
Deep learning, due to its unprecedented success in tasks such as image classification, has emerged as a new tool in image reconstruction with potential to change the field. In this paper we demonstrate a crucial phenomenon: deep learning typically yields unstablemethods for image reconstruction. The instabilities usually occur in several forms: (1) tiny, almost undetectable perturbations, both in the image and sampling domain, may result in severe artefacts in the reconstruction, (2) a small structural change, for example a tumour, may not be captured in the reconstructed image and (3) (a counterintuitive type of instability) more samples may yield poorer performance. Our new stability test with algorithms and easy to use software detects the instability phenomena. The test is aimed at researchers to test their networks for instabilities and for government agencies, such as the Food and Drug Administration (FDA), to secure safe use of deep learning methods.