Sparse CCA via Precision Adjusted Iterative Thresholding
This work addresses a theoretical gap in high-dimensional data analysis for researchers in statistics and bioinformatics, though it is incremental as it builds on existing sparse CCA methods.
The paper tackled the lack of theoretical foundation for sparse Canonical Correlation Analysis (CCA) in high-dimensional settings by introducing a characterization for sparsity, proposing the CAPIT procedure for estimation, and proving it is rate-optimal. It applied the method to a breast cancer dataset, identifying methylation probes linked to genes known as prognosis signatures for metastasis.
Sparse Canonical Correlation Analysis (CCA) has received considerable attention in high-dimensional data analysis to study the relationship between two sets of random variables. However, there has been remarkably little theoretical statistical foundation on sparse CCA in high-dimensional settings despite active methodological and applied research activities. In this paper, we introduce an elementary sufficient and necessary characterization such that the solution of CCA is indeed sparse, propose a computationally efficient procedure, called CAPIT, to estimate the canonical directions, and show that the procedure is rate-optimal under various assumptions on nuisance parameters. The procedure is applied to a breast cancer dataset from The Cancer Genome Atlas project. We identify methylation probes that are associated with genes, which have been previously characterized as prognosis signatures of the metastasis of breast cancer.