DCMar 22
CALVO: Improve Serving Efficiency for LLM Inferences with Intense Network DemandsWeiye Wang, Chen Chen, Junxue Zhang et al.
Distributed prefix caching has become a core technique for efficient LLM serving. However, for long-context requests with high cache hit ratios, retrieving reusable KVCache blocks from remote servers has emerged as a new performance bottleneck. Such network-intensive LLM inference is expected to become increasingly common as agentic AI workloads continue to grow. However, existing LLM inference engines remain largely compute-centric: they treat KVCache loading as a subordinate phase to GPU execution and often fail to account for its delay explicitly during scheduling. We present CALVO, an LLM serving engine that treats KVCache loading as a first-class concern. CALVO decouples KVCache loading and GPU computation into independently managed, asynchronously progressing stages, enabling better utilization of network, PCIe, and computation resources. In addition, CALVO incorporates KVCache loading delay as an explicit component of per-request service cost, leading to more accurate scheduling decisions. Experiments on a real testbed with diverse long-context workloads show that CALVO substantially improves the efficiency of network-intensive LLM inference, achieving up to 61.67% higher SLO attainment than the baseline.
ITJan 28, 2022
Communication Cost of Two-Database Symmetric Private Information Retrieval: A Conditional Disclosure of Multiple Secrets PerspectiveZhusheng Wang, Sennur Ulukus
We consider the total (upload plus download) communication cost of two-database symmetric private information retrieval (SPIR) through its relationship to conditional disclosure of secrets (CDS). In SPIR, a user wishes to retrieve a message out of $K$ messages from $N$ non-colluding and replicated databases without learning anything beyond the retrieved message, while no individual database learns the retrieved message index. In CDS, two parties each holding an individual input and sharing a common secret wish to disclose this secret to an external party in an efficient manner if and only if their inputs satisfy a public deterministic function. As a natural extension of CDS, we introduce conditional disclosure of multiple secrets (CDMS) where two parties share multiple i.i.d.~common secrets rather than a single common secret as in CDS. We show that a special configuration of CDMS is equivalent to two-database SPIR. Inspired by this equivalence, we design download cost efficient SPIR schemes using bipartite graph representation of CDS and CDMS, and determine the exact minimum total communication cost of $N=2$ database SPIR for $K=3$ messages.
ITMay 12, 2021
Symmetric Private Information Retrieval with User-Side Common RandomnessZhusheng Wang, Sennur Ulukus
We consider the problem of symmetric private information retrieval (SPIR) with user-side common randomness. In SPIR, a user retrieves a message out of $K$ messages from $N$ non-colluding and replicated databases in such a way that no single database knows the retrieved message index (user privacy), and the user gets to know nothing further than the retrieved message (database privacy). SPIR has a capacity smaller than the PIR capacity which requires only user privacy, is infeasible in the case of a single database, and requires shared common randomness among the databases. We introduce a new variant of SPIR where the user is provided with a random subset of the shared database common randomness, which is unknown to the databases. We determine the exact capacity region of the triple $(d, ρ_S, ρ_U)$, where $d$ is the download cost, $ρ_S$ is the amount of shared database (server) common randomness, and $ρ_U$ is the amount of available user-side common randomness. We show that with a suitable amount of $ρ_U$, this new SPIR achieves the capacity of conventional PIR. As a corollary, single-database SPIR becomes feasible. Further, the presence of user-side $ρ_U$ reduces the amount of required server-side $ρ_S$.
ITAug 17, 2020
Multi-Party Private Set Intersection: An Information-Theoretic ApproachZhusheng Wang, Karim Banawan, Sennur Ulukus
We investigate the problem of multi-party private set intersection (MP-PSI). In MP-PSI, there are $M$ parties, each storing a data set $\mathcal{p}_i$ over $N_i$ replicated and non-colluding databases, and we want to calculate the intersection of the data sets $\cap_{i=1}^M \mathcal{p}_i$ without leaking any information beyond the set intersection to any of the parties. We consider a specific communication protocol where one of the parties, called the leader party, initiates the MP-PSI protocol by sending queries to the remaining parties which are called client parties. The client parties are not allowed to communicate with each other. We propose an information-theoretic scheme that privately calculates the intersection $\cap_{i=1}^M \mathcal{p}_i$ with a download cost of $D = \min_{t \in \{1, \cdots, M\}} \sum_{i \in \{1, \cdots M\}\setminus {t}} \left\lceil \frac{|\mathcal{p}_t|N_i}{N_i-1}\right\rceil$. Similar to the 2-party PSI problem, our scheme builds on the connection between the PSI problem and the multi-message symmetric private information retrieval (MM-SPIR) problem. Our scheme is a non-trivial generalization of the 2-party PSI scheme as it needs an intricate design of the shared common randomness. Interestingly, in terms of the download cost, our scheme does not incur any penalty due to the more stringent privacy constraints in the MP-PSI problem compared to the 2-party PSI problem.
ITDec 31, 2019
Private Set Intersection: A Multi-Message Symmetric Private Information Retrieval PerspectiveZhusheng Wang, Karim Banawan, Sennur Ulukus
We study the problem of private set intersection (PSI). In this problem, there are two entities $E_i$, for $i=1, 2$, each storing a set $\mathcal{P}_i$, whose elements are picked from a finite field $\mathbb{F}_K$, on $N_i$ replicated and non-colluding databases. It is required to determine the set intersection $\mathcal{P}_1 \cap \mathcal{P}_2$ without leaking any information about the remaining elements to the other entity with the least amount of downloaded bits. We first show that the PSI problem can be recast as a multi-message symmetric private information retrieval (MM-SPIR) problem. Next, as a stand-alone result, we derive the information-theoretic sum capacity of MM-SPIR, $C_{MM-SPIR}$. We show that with $K$ messages, $N$ databases, and the size of the desired message set $P$, the exact capacity of MM-SPIR is $C_{MM-SPIR} = 1 - \frac{1}{N}$ when $P \leq K-1$, provided that the entropy of the common randomness $S$ satisfies $H(S) \geq \frac{P}{N-1}$ per desired symbol. This result implies that there is no gain for MM-SPIR over successive single-message SPIR (SM-SPIR). For the MM-SPIR problem, we present a novel capacity-achieving scheme that builds on the near-optimal scheme of Banawan-Ulukus originally proposed for the multi-message PIR (MM-PIR) problem without database privacy constraints. Surprisingly, our scheme here is exactly optimal for the MM-SPIR problem for any $P$, in contrast to the scheme for the MM-PIR problem, which was proved only to be near-optimal. Our scheme is an alternative to the SM-SPIR scheme of Sun-Jafar. Based on this capacity result for MM-SPIR, and after addressing the added requirements in its conversion to the PSI problem, we show that the optimal download cost for the PSI problem is $\min\left\{\left\lceil\frac{P_1 N_2}{N_2-1}\right\rceil, \left\lceil\frac{P_2 N_1}{N_1-1}\right\rceil\right\}$, where $P_i$ is the cardinality of set $\mathcal{P}_i$