CRMay 31, 2025Code
dpmm: Differentially Private Marginal Models, a Library for Synthetic Tabular Data GenerationSofiane Mahiou, Amir Dizche, Reza Nazari et al.
We propose dpmm, an open-source library for synthetic data generation with Differentially Private (DP) guarantees. It includes three popular marginal models -- PrivBayes, MST, and AIM -- that achieve superior utility and offer richer functionality compared to alternative implementations. Additionally, we adopt best practices to provide end-to-end DP guarantees and address well-known DP-related vulnerabilities. Our goal is to accommodate a wide audience with easy-to-install, highly customizable, and robust model implementations. Our codebase is available from https://github.com/sassoftware/dpmm.
IVJul 17, 2019
An AI-Augmented Lesion Detection Framework For Liver Metastases With Model InterpretabilityXin J. Hunt, Ralph Abbey, Ricky Tharrington et al.
Colorectal cancer (CRC) is the third most common cancer and the second leading cause of cancer-related deaths worldwide. Most CRC deaths are the result of progression of metastases. The assessment of metastases is done using the RECIST criterion, which is time consuming and subjective, as clinicians need to manually measure anatomical tumor sizes. AI has many successes in image object detection, but often suffers because the models used are not interpretable, leading to issues in trust and implementation in the clinical setting. We propose a framework for an AI-augmented system in which an interactive AI system assists clinicians in the metastasis assessment. We include model interpretability to give explanations of the reasoning of the underlying models.
LGMay 1, 2019
High-Performance Support Vector Machines and Its ApplicationsTaiping He, Tao Wang, Ralph Abbey et al.
The support vector machines (SVM) algorithm is a popular classification technique in data mining and machine learning. In this paper, we propose a distributed SVM algorithm and demonstrate its use in a number of applications. The algorithm is named high-performance support vector machines (HPSVM). The major contribution of HPSVM is two-fold. First, HPSVM provides a new way to distribute computations to the machines in the cloud without shuffling the data. Second, HPSVM minimizes the inter-machine communications in order to maximize the performance. We apply HPSVM to some real-world classification problems and compare it with the state-of-the-art SVM technique implemented in R on several public data sets. HPSVM achieves similar or better results.