Jeremy Fineman

LGSep 1, 2022

Self-supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax

Hao-Ren Yao, Nairen Cao, Katina Russell et al.

Learning Electronic Health Records (EHRs) representation is a preeminent yet under-discovered research topic. It benefits various clinical decision support applications, e.g., medication outcome prediction or patient similarity search. Current approaches focus on task-specific label supervision on vectorized sequential EHR, which is not applicable to large-scale unsupervised scenarios. Recently, contrastive learning shows great success on self-supervised representation learning problems. However, complex temporality often degrades the performance. We propose Graph Kernel Infomax, a self-supervised graph kernel learning approach on the graphical representation of EHR, to overcome the previous problems. Unlike the state-of-the-art, we do not change the graph structure to construct augmented views. Instead, we use Kernel Subspace Augmentation to embed nodes into two geometrically different manifold views. The entire framework is trained by contrasting nodes and graph representations on those two manifold views through the commonly used contrastive objectives. Empirically, using publicly available benchmark EHR datasets, our approach yields performance on clinical downstream tasks that exceeds the state-of-the-art. Theoretically, the variation on distance metrics naturally creates different views as data augmentation without changing graph structures.

80.7DSApr 8

Parallel Batch-Dynamic Maximal Independent Set

Guy Blelloch, Andrew Brady, Laxman Dhulipala et al.

We develop the first theoretically-efficient algorithm for maintaining the maximal independent set (MIS) of a graph in the parallel batch-dynamic setting. In this setting, a graph is updated with batches of edge insertions/deletions, and for each batch a parallel algorithm updates the maximal independent set to agree with the new graph. A batch-dynamic algorithm is considered efficient if it is work efficient (i.e., does no more asymptotic work than applying the updates sequentially) and has polylogarithmic depth (parallel time). In the sequential setting, the best known dynamic algorithms for MIS, by Chechik and Zhang (CZ) [FOCS19] and Behnezhad et al. (BDHSS) [FOCS19], take $O(\log^4 n)$ time per update in expectation. For a batch of $b$ updates, our algorithm has $O(b \log^3 n)$ expected work and polylogarithmic depth with high probability (whp). It therefore outperforms the best algorithm even in the sequential dynamic case ($b = 1)$. As with the sequential dynamic MIS algorithms of CZ and BDHSS, our solution maintains a lexicographically first MIS based on a random ordering of the vertices. Their analysis relied on a result of Censor-Hillel, Haramaty and Karnin [PODC16] that bounded the ``influence set" for a single update, but surprisingly, the influence of a batch is not simply the union of the influence of each update therein. We therefore develop a new approach to analyze the influence set for a batch of updates. Our construction of the batch influence set is natural and leads to an arguably simpler analysis than prior work. We then instrument this construction to bound the work of our algorithm. To argue our depth is polylogarithmic, we prove that the number of subrounds our algorithm takes is the same as depth bounds on parallel static MIS.

Jeremy Fineman

2 Papers