Dropping Convexity for More Efficient and Scalable Online Multiview Learning
This work addresses scalability issues in multiview learning for data analysis and machine learning applications, though it is incremental as it builds on existing heuristic nonconvex approaches.
The paper tackles the computational inefficiency of convex optimization in multiview representation learning by proposing a nonconvex formulation solved with stochastic gradient descent, achieving theoretical justification for global convergence and empirical efficiency.
Multiview representation learning is very popular for latent factor analysis. It naturally arises in many data analysis, machine learning, and information retrieval applications to model dependent structures among multiple data sources. For computational convenience, existing approaches usually formulate the multiview representation learning as convex optimization problems, where global optima can be obtained by certain algorithms in polynomial time. However, many pieces of evidence have corroborated that heuristic nonconvex approaches also have good empirical computational performance and convergence to the global optima, although there is a lack of theoretical justification. Such a gap between theory and practice motivates us to study a nonconvex formulation for multiview representation learning, which can be efficiently solved by a simple stochastic gradient descent (SGD) algorithm. We first illustrate the geometry of the nonconvex formulation; Then, we establish asymptotic global rates of convergence to the global optima by diffusion approximations. Numerical experiments are provided to support our theory.