A Fast Screening Approach for High-dimensional Outcomes and High-dimensional Predictors

arXiv:2606.0301857.0

AI Analysis

For researchers analyzing high-dimensional multimodal data (e.g., genomics), GIDS offers a computationally efficient screening method that reduces both outcome and predictor dimensions, improving tractability and interpretability.

GIDS simultaneously reduces dimensionality of both predictors and outcomes in high-dimensional cross-modal analyses, outperforming existing methods that screen only predictors. On ADNI data, it reduced 865K methylation and 49K transcriptomic features to ~9K CpGs and ~2K transcripts, revealing interpretable blockwise interaction structures.

Modeling interactions among multimodal, high-dimensional data is intrinsically challenging due to ultra-high dimensionality and complex dependence structure with high level noise. Screening methods are effective for reducing dimensionality, but most existing approaches shrink only the predictor space while retaining all outcomes. In cross-modal analyses, different outcomes often select different predictor subsets, so the union remains large and the response dimension is unchanged, limiting the practical benefit of screening. This gives rise to heavy computational burdens and poor interpretability. To address these limitations, we propose a new screening framework, Graph Independence Dual Screening (GIDS), which simultaneously reduces the dimensionality of response variables and predictors. We design computationally efficient algorithms that facilitate downstream selection procedures, improving accuracy and scalability, and establish supporting theoretical results. Extensive simulation studies demonstrate that GIDS outperforms existing methods that screen only predictors. To illustrate its utility, we applied GIDS to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, analyzing interactions between genome-wide 865,353 DNA methylation and 49,386 transcriptomic variables. GIDS reduced the feature space to approximately 9,000 CpGs and 2,000 transcripts, uncovering blockwise interaction structures: clusters of CpG sites and gene transcripts with strong associations. These findings not only improve computational tractability but also yield interpretable biological insights, highlighting coordinated regulatory mechanisms underlying Alzheimer's disease.

View on arXiv PDF

Similar