Multi-source Learning via Completion of Block-wise Overlapping Noisy Matrices
This addresses the challenge of combining noisy, incomplete matrices from diverse sources like EHR and medical text for biomedical applications, representing an incremental improvement in matrix completion techniques.
The paper tackles the problem of integrating multi-source biomedical data with block-wise missingness by proposing the BONMI method, which aligns eigenspaces and completes missing blocks, showing advantages over existing methods in tasks like integrating EHR and text data and machine translation.
Matrix completion has attracted attention in many fields, including statistics, applied mathematics, and electrical engineering. Most of the works focus on the independent sampling models under which the observed entries are sampled independently. Motivated by applications in the integration of knowledge graphs derived from multi-source biomedical data such as those from Electronic Health Records (EHR) and biomedical text, we propose the {\bf B}lock-wise {\bf O}verlapping {\bf N}oisy {\bf M}atrix {\bf I}ntegration (BONMI) to treat blockwise missingness of symmetric matrices representing relatedness between entity pairs. Our idea is to exploit the orthogonal Procrustes problem to align the eigenspace of the two sub-matrices, then complete the missing blocks by the inner product of the two low-rank components. Besides, we prove the statistical rate for the eigenspace of the underlying matrix, which is comparable to the rate under the independently missing assumption. Simulation studies show that the method performs well under a variety of configurations. In the real data analysis, the method is applied to two tasks: (i) the integrating of several point-wise mutual information matrices built by English EHR and Chinese medical text data, and (ii) the machine translation between English and Chinese medical concepts. Our method shows an advantage over existing methods.