Consensus Knowledge Graph Learning via Multi-view Sparse Low Rank Block Model
This work addresses the problem of high-dimensional network analysis for researchers and practitioners in fields like healthcare, by providing a method to combine multiple data sources for consensus knowledge graph learning, though it appears incremental as it builds on existing low-rank and multi-view techniques.
The paper tackles the challenge of accurately identifying node-node interactions in large networks by proposing a unified multi-view sparse low-rank block model (msLBM) framework that enables simultaneous grouping and connectivity analysis using multiple data sources, with applications to electronic health record data showing more reliable network structure revelation.
Network analysis has been a powerful tool to unveil relationships and interactions among a large number of objects. Yet its effectiveness in accurately identifying important node-node interactions is challenged by the rapidly growing network size, with data being collected at an unprecedented granularity and scale. Common wisdom to overcome such high dimensionality is collapsing nodes into smaller groups and conducting connectivity analysis on the group level. Dividing efforts into two phases inevitably opens a gap in consistency and drives down efficiency. Consensus learning emerges as a new normal for common knowledge discovery with multiple data sources available. In this paper, we propose a unified multi-view sparse low-rank block model (msLBM) framework, which enables simultaneous grouping and connectivity analysis by combining multiple data sources. The msLBM framework efficiently represents overlapping information across large scale concepts and accommodates different types of heterogeneity across sources. Both features are desirable when analyzing high dimensional electronic health record (EHR) datasets from multiple health systems. An estimating procedure based on the alternating minimization algorithm is proposed. Our theoretical results demonstrate that a consensus knowledge graph can be more accurately learned by leveraging multi-source datasets, and statistically optimal rates can be achieved under mild conditions. Applications to the real world EHR data suggest that our proposed msLBM algorithm can more reliably reveal network structure among clinical concepts by effectively combining summary level EHR data from multiple health systems.