ML LGNov 13, 2014

A Randomized Algorithm for CCA

arXiv:1411.3409v121 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of efficient CCA computation for large-scale data processing in distributed systems like Hadoop, though it is incremental as it builds on existing randomized and iterative techniques.

The paper tackles the problem of computing canonical correlation analysis (CCA) on large datasets that are stored out-of-core or distributed, where iteration is costly, by introducing RandomizedCCA, which achieves accurate results in as few as two data passes and serves as an effective initializer for iterative methods.

We present RandomizedCCA, a randomized algorithm for computing canonical analysis, suitable for large datasets stored either out of core or on a distributed file system. Accurate results can be obtained in as few as two data passes, which is relevant for distributed processing frameworks in which iteration is expensive (e.g., Hadoop). The strategy also provides an excellent initializer for standard iterative solutions.

View on arXiv PDF

Similar