A New Angle on L2 Regularization
This addresses a theoretical problem in machine learning, but appears incremental as it focuses on explaining an existing phenomenon.
The paper investigates the relationship between L2 regularization and the angle between cluster centroids and hyperplane normals in linear classification, but does not report specific results or numbers.
Imagine two high-dimensional clusters and a hyperplane separating them. Consider in particular the angle between: the direction joining the two clusters' centroids and the normal to the hyperplane. In linear classification, this angle depends on the level of L2 regularization used. Can you explain why?