Vishal Rana

LG
h-index3
3papers
46citations
Novelty52%
AI Score33

3 Papers

LGOct 28, 2022Code
Machine Unlearning of Federated Clusters

Chao Pan, Jin Sima, Saurav Prakash et al.

Federated clustering (FC) is an unsupervised learning problem that arises in a number of practical applications, including personalized recommender and healthcare systems. With the adoption of recent laws ensuring the "right to be forgotten", the problem of machine unlearning for FC methods has become of significant importance. We introduce, for the first time, the problem of machine unlearning for FC, and propose an efficient unlearning mechanism for a customized secure FC framework. Our FC framework utilizes special initialization procedures that we show are well-suited for unlearning. To protect client data privacy, we develop the secure compressed multiset aggregation (SCMA) framework that addresses sparse secure federated learning (FL) problems encountered during clustering as well as more general problems. To simultaneously facilitate low communication complexity and secret sharing protocols, we integrate Reed-Solomon encoding with special evaluation points into our SCMA pipeline, and prove that the client communication cost is logarithmic in the vector dimension. Additionally, to demonstrate the benefits of our unlearning mechanism over complete retraining, we provide a theoretical analysis for the unlearning performance of our approach. Simulation results show that the new FC framework exhibits superior clustering performance compared to previously reported FC baselines when the cluster sizes are highly imbalanced. Compared to completely retraining K-means++ locally and globally for each removal request, our unlearning procedure offers an average speed-up of roughly 84x across seven datasets. Our implementation for the proposed method is available at https://github.com/thupchnsky/mufc.

LGSep 1, 2024
Federated Aggregation of Mallows Rankings: A Comparative Analysis of Borda and Lehmer Coding

Jin Sima, Vishal Rana, Olgica Milenkovic

Rank aggregation combines multiple ranked lists into a consensus ranking. In fields like biomedical data sharing, rankings may be distributed and require privacy. This motivates the need for federated rank aggregation protocols, which support distributed, private, and communication-efficient learning across multiple clients with local data. We present the first known federated rank aggregation methods using Borda scoring and Lehmer codes, focusing on the sample complexity for federated algorithms on Mallows distributions with a known scaling factor $φ$ and an unknown centroid permutation $σ_0$. Federated Borda approach involves local client scoring, nontrivial quantization, and privacy-preserving protocols. We show that for $φ\in [0,1)$, and arbitrary $σ_0$ of length $N$, it suffices for each of the $L$ clients to locally aggregate $\max\{C_1(φ), C_2(φ)\frac{1}{L}\log \frac{N}δ\}$ rankings, where $C_1(φ)$ and $C_2(φ)$ are constants, quantize the result, and send it to the server who can then recover $σ_0$ with probability $\geq 1-δ$. Communication complexity scales as $NL \log N$. Our results represent the first rigorous analysis of Borda's method in centralized and distributed settings under the Mallows model. Federated Lehmer coding approach creates a local Lehmer code for each client, using a coordinate-majority aggregation approach with specialized quantization methods for efficiency and privacy. We show that for $φ+φ^2<1+φ^N$, and arbitrary $σ_0$ of length $N$, it suffices for each of the $L$ clients to locally aggregate $\max\{C_3(φ), C_4(φ)\frac{1}{L}\log \frac{N}δ\}$ rankings, where $C_3(φ)$ and $C_4(φ)$ are constants. Clients send truncated Lehmer coordinate histograms to the server, which can recover $σ_0$ with probability $\geq 1-δ$. Communication complexity is $\sim O(N\log NL\log L)$.

LGApr 8, 2025
DMol: A Highly Efficient and Chemical Motif-Preserving Molecule Generation Platform

Peizhi Niu, Yu-Hsiang Wang, Vishal Rana et al.

We introduce a new graph diffusion model for small molecule generation, DMol, which outperforms the state-of-the-art DiGress model in terms of validity by roughly 1.5% across all benchmarking datasets while reducing the number of diffusion steps by at least 10-fold, and the running time to roughly one half. The performance improvements are a result of a careful change in the objective function and a graph noise scheduling approach which, at each diffusion step, allows one to only change a subset of nodes of varying size in the molecule graph. Another relevant property of the method is that it can be easily combined with junction-tree-like graph representations that arise by compressing a collection of relevant ring structures into supernodes. Unlike classical junction-tree techniques that involve VAEs and require complicated reconstruction steps, compressed DMol directly performs graph diffusion on a graph that compresses only a carefully selected set of frequent carbon rings into supernodes, which results in straightforward sample generation. This compressed DMol method offers additional validity improvements over generic DMol of roughly 2%, increases the novelty of the method, and further improves the running time due to reductions in the graph size.