PFSep 10, 2025

Memshare: Memory Sharing for Multicore Computation in R with an Application to Feature Selection by Mutual Information using PDE

arXiv:2509.08632h-index: 15
Originality Synthesis-oriented
AI Analysis

For R users performing big data analytics, memshare reduces memory overhead and improves speed for parallel computations, addressing a practical bottleneck in memory-constrained multicore processing.

Memshare enables shared memory multicore computation in R, achieving a 2x speedup over SharedObject with no additional resident memory in a column-wise apply benchmark, and demonstrates utility for feature selection by mutual information on large RNA-seq data.

We present memshare\footnote{The Software package is published as a CRAN package under https://CRAN.R-project.org/package=memshare, a package that enables shared memory multicore computation in R by allocating buffers in C++ shared memory and exposing them to R through ALTREP views. We compare memshare to SharedObject (Bioconductor) discuss semantics and safety, and report a 2x speedup over SharedObject with no additional resident memory in a column wise apply benchmark. Finally, we illustrate a downstream analytics use case: feature selection by mutual information in which densities are estimated per feature via Pareto Density Estimation (PDE). The analytical use-case is an RNA seq dataset consisting of N=10,446 cases and d=19,637 gene expressions requiring roughly n_threads * 10GB of memory in the case of using parallel R sessions. Such and larger use-cases are common in big data analytics and make R feel limiting sometimes which is mitigated by the addition of the library presented in this work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes