ITCRIRJun 5, 2017

The Capacity of Private Information Retrieval from Byzantine and Colluding Databases

arXiv:1706.01442v1204 citations
Originality Highly original
AI Analysis

This provides a foundational result for secure data retrieval in distributed systems with potential adversarial nodes, impacting privacy-preserving technologies.

The paper tackles the problem of private information retrieval from databases that may be outdated or adversarial (Byzantine), deriving the exact information-theoretic capacity for single-round retrieval with collusion constraints, showing it is C = (N-2B)/N * (1 - T/(N-2B)) / (1 - (T/(N-2B))^M) when 2B+T < N.

We consider the problem of single-round private information retrieval (PIR) from $N$ replicated databases. We consider the case when $B$ databases are outdated (unsynchronized), or even worse, adversarial (Byzantine), and therefore, can return incorrect answers. In the PIR problem with Byzantine databases (BPIR), a user wishes to retrieve a specific message from a set of $M$ messages with zero-error, irrespective of the actions performed by the Byzantine databases. We consider the $T$-privacy constraint in this paper, where any $T$ databases can collude, and exchange the queries submitted by the user. We derive the information-theoretic capacity of this problem, which is the maximum number of \emph{correct symbols} that can be retrieved privately (under the $T$-privacy constraint) for every symbol of the downloaded data. We determine the exact BPIR capacity to be $C=\frac{N-2B}{N}\cdot\frac{1-\frac{T}{N-2B}}{1-(\frac{T}{N-2B})^M}$, if $2B+T < N$. This capacity expression shows that the effect of Byzantine databases on the retrieval rate is equivalent to removing $2B$ databases from the system, with a penalty factor of $\frac{N-2B}{N}$, which signifies that even though the number of databases needed for PIR is effectively $N-2B$, the user still needs to access the entire $N$ databases. The result shows that for the unsynchronized PIR problem, if the user does not have any knowledge about the fraction of the messages that are mis-synchronized, the single-round capacity is the same as the BPIR capacity. Our achievable scheme extends the optimal achievable scheme for the robust PIR (RPIR) problem to correct the \emph{errors} introduced by the Byzantine databases as opposed to \emph{erasures} in the RPIR problem. Our converse proof uses the idea of the cut-set bound in the network coding problem against adversarial nodes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes