LGJan 21
Multimodal Rumor Detection Enhanced by External Evidence and Forgery FeaturesHan Li, Hua Sun
Social media increasingly disseminates information through mixed image text posts, but rumors often exploit subtle inconsistencies and forged content, making detection based solely on post content difficult. Deep semantic mismatch rumors, which superficially align images and texts, pose particular challenges and threaten online public opinion. Existing multimodal rumor detection methods improve cross modal modeling but suffer from limited feature extraction, noisy alignment, and inflexible fusion strategies, while ignoring external factual evidence necessary for verifying complex rumors. To address these limitations, we propose a multimodal rumor detection model enhanced with external evidence and forgery features. The model uses a ResNet34 visual encoder, a BERT text encoder, and a forgery feature module extracting frequency-domain traces and compression artifacts via Fourier transformation. BLIP-generated image descriptions bridge image and text semantic spaces. A dual contrastive learning module computes contrastive losses between text image and text description pairs, improving detection of semantic inconsistencies. A gated adaptive feature-scaling fusion mechanism dynamically adjusts multimodal fusion and reduces redundancy. Experiments on Weibo and Twitter datasets demonstrate that our model outperforms mainstream baselines in macro accuracy, recall, and F1 score.
ITMar 6, 2025
Fundamental Limits of Hierarchical Secure Aggregation with Cyclic User AssociationXiang Zhang, Zhou Li, Kai Wan et al.
Secure aggregation is motivated by federated learning (FL) where a cloud server aims to compute an averaged model (i.e., weights of deep neural networks) of the locally-trained models of numerous clients, while adhering to data security requirements. Hierarchical secure aggregation (HSA) extends this concept to a three-layer hierarchical network, where clustered users communicate with the server through an intermediate layer of relays. In HSA, beyond conventional server security, relay security is also enforced to ensure that the relays remain oblivious to the users' inputs (an abstraction of the local models in FL). Existing study on HSA assumes that each user is associated with only one relay, limiting opportunities for coding across inter-cluster users to achieve efficient communication and key generation. In this paper, we consider HSA with a cyclic association pattern where each user is connected to $B$ consecutive relays in a wrap-around manner. We propose an efficient aggregation scheme which includes a message design for the inputs inspired by gradient coding-a well-known technique for efficient communication in distributed computing-along with a highly non-trivial security key design. We also derive novel converse bounds on the minimum achievable communication and key rates using information-theoretic arguments.
ITFeb 2, 2021
A New Design of Cache-aided Multiuser Private Information Retrieval with Uncoded PrefetchingXiang Zhang, Kai Wan, Hua Sun et al.
In the problem of cache-aided multiuser private information retrieval (MuPIR), a set of $K_{\rm u}$ cache-equipped users wish to privately download a set of messages from $N$ distributed databases each holding a library of $K$ messages. The system works in two phases: {\it cache placement (prefetching) phase} in which the users fill up their cache memory, and {\it private delivery phase} in which the users' demands are revealed and they download an answer from each database so that the their desired messages can be recovered while each individual database learns nothing about the identities of the requested messages. The goal is to design the placement and the private delivery phases such that the \emph{load}, which is defined as the total number of downloaded bits normalized by the message size, is minimized given any user memory size. This paper considers the MuPIR problem with two messages, arbitrary number of users and databases where uncoded prefetching is assumed, i.e., the users directly copy some bits from the library as their cached contents. We propose a novel MuPIR scheme inspired by the Maddah-Ali and Niesen (MAN) coded caching scheme. The proposed scheme achieves lower load than any existing schemes, especially the product design (PD), and is shown to be optimal within a factor of $8$ in general and exactly optimal at very high or low memory regime.
ITJan 19, 2021
Information Theoretic Secure Aggregation with User DropoutsYizhou Zhao, Hua Sun
In the robust secure aggregation problem, a server wishes to learn and only learn the sum of the inputs of a number of users while some users may drop out (i.e., may not respond). The identity of the dropped users is not known a priori and the server needs to securely recover the sum of the remaining surviving users. We consider the following minimal two-round model of secure aggregation. Over the first round, any set of no fewer than $U$ users out of $K$ users respond to the server and the server wants to learn the sum of the inputs of all responding users. The remaining users are viewed as dropped. Over the second round, any set of no fewer than $U$ users of the surviving users respond (i.e., dropouts are still possible over the second round) and from the information obtained from the surviving users over the two rounds, the server can decode the desired sum. The security constraint is that even if the server colludes with any $T$ users and the messages from the dropped users are received by the server (e.g., delayed packets), the server is not able to infer any additional information beyond the sum in the information theoretic sense. For this information theoretic secure aggregation problem, we characterize the optimal communication cost. When $U \leq T$, secure aggregation is not feasible, and when $U > T$, to securely compute one symbol of the sum, the minimum number of symbols sent from each user to the server is $1$ over the first round, and $1/(U-T)$ over the second round.
ITOct 13, 2020
On the Fundamental Limits of Cache-aided Multiuser Private Information RetrievalXiang Zhang, Kai Wan, Hua Sun et al.
We consider the problem of cache-aided Multiuser Private Information Retrieval (MuPIR) which is an extension of the single-user cache-aided PIR problem to the case of multiple users. In MuPIR, each of the $K_{\rm u}$ cache-equipped users wishes to privately retrieve a message out of $K$ messages from $N$ databases each having access to the entire message library. The privacy constraint requires that any individual database learns nothing about the demands of all users. The users are connected to each database via an error-free shared-link. In this paper, we aim to characterize the optimal trade-off between users' memory and communication load for such systems. Based on the proposed novel approach of \emph{cache-aided interference alignment (CIA)}, first, for the MuPIR problem with $K=2$ messages, $K_{\rm u}=2$ users and $N\ge 2$ databases, we propose achievable retrieval schemes for both uncoded and general cache placement. The CIA approach is optimal when the cache placement is uncoded. For general cache placement, the CIA approach is optimal when $N=2$ and $3$ verified by the computer-aided approach. Second, when $K,K_{\rm u}$ and $N$ are general, we propose a new \emph{product design} (PD) which incorporates the PIR code into the linear caching code. The product design is shown to be order optimal within a multiplicative factor of 8 and is exactly optimal when the user cache memory size is large.
ITApr 30, 2020
Compound Secure Groupcast: Key Assignment for Selected BroadcastingHua Sun
The compound secure groupcast problem is considered, where the key variables at $K$ receivers are designed so that a transmitter can securely groupcast a message to any $N$ out of the $K$ receivers through a noiseless broadcast channel. The metric is the information theoretic tradeoff between key storage $α$, i.e., the number of bits of the key variable per message bit, and broadcast bandwidth $β$, i.e., the number of bits of the broadcast information per message bit. We have three main results. First, when broadcast bandwidth is minimized, i.e., when $β= 1$, we show that the minimum key storage is $α= N$. Second, when key storage is minimized, i.e., when $α= 1$, we show that broadcast bandwidth $β= \min(N, K-N+1)$ is achievable and is optimal (minimum) if $N=2$ or $K-1$. Third, when $N=2$, the optimal key storage and broadcast bandwidth tradeoff is characterized as $α+β\geq 3, α\geq 1, β\geq 1$.
ITMar 26, 2020
Secure Groupcast with Shared KeysHua Sun
We consider a transmitter and $K$ receivers, each of which shares a key variable with the transmitter. Through a noiseless broadcast channel, the transmitter wishes to send a common message $W$ securely to $N$ out of the $K$ receivers while the remaining $K-N$ receivers learn no information about $W$. We are interested in the maximum message rate, i.e., the maximum number of bits of $W$ that can be securely groupcast to the legitimate receivers per key block and the minimum broadcast bandwidth, i.e., the minimum number of bits of the broadcast information required to securely groupcast the message bits. We focus on the setting of combinatorial keys, where every subset of the $K$ receivers share an independent key of arbitrary size. Under this combinatorial key setting, the maximum message rate is characterized for the following scenarios - 1) $N=1$ or $N=K-1$, i.e., secure unicast to 1 receiver with $K-1$ eavesdroppers or secure groupcast to $K-1$ receivers with $1$ eavesdropper, 2) $N=2, K=4$, i.e., secure groupcast to $2$ out of 4 receivers, and 3) the symmetric setting where the key size for any subset of the same cardinality is equal for any $N,K$. Further, for the latter two cases, the minimum broadcast bandwidth for the maximum message rate is characterized.
ITFeb 13, 2020
Conditional Disclosure of Secrets: A Noise and Signal Alignment ApproachZhou Li, Hua Sun
In the conditional disclosure of secrets (CDS) problem, Alice and Bob (each holds an input and a common secret) wish to disclose, as efficiently as possible, the secret to Carol if and only if their inputs satisfy some function. The capacity of CDS is the maximum number of bits of the secret that can be securely disclosed per bit of total communication. We characterize the necessary and sufficient condition for the extreme case where the capacity of CDS is the highest and is equal to 1/2. For the simplest instance where the capacity is smaller than 1/2, we show that the linear capacity is 2/5.
ITJan 2, 2020
Expand-and-Randomize: An Algebraic Approach to Secure ComputationYizhou Zhao, Hua Sun
We consider the secure computation problem in a minimal model, where Alice and Bob each holds an input and wish to securely compute a function of their inputs at Carol without revealing any additional information about the inputs. For this minimal secure computation problem, we propose a novel coding scheme built from two steps. First, the function to be computed is expanded such that it can be recovered while additional information might be leaked. Second, a randomization step is applied to the expanded function such that the leaked information is protected. We implement this expand-and-randomize coding scheme with two algebraic structures - the finite field and the modulo ring of integers, where the expansion step is realized with the addition operation and the randomization step is realized with the multiplication operation over the respective algebraic structures.
ITAug 22, 2018
Capacity-Achieving Private Information Retrieval Codes with Optimal Message Size and Upload CostChao Tian, Hua Sun, Jun Chen
We propose a new capacity-achieving code for the private information retrieval (PIR) problem, and show that it has the minimum message size (being one less than the number of servers) and the minimum upload cost (being roughly linear in the number of messages) among a general class of capacity-achieving codes, and in particular, among all capacity-achieving linear codes. Different from existing code constructions, the proposed code is asymmetric, and this asymmetry appears to be the key factor leading to the optimal message size and the optimal upload cost. The converse results on the message size and the upload cost are obtained by a strategic analysis of the information theoretic proof of the PIR capacity, from which a set of critical properties of any capacity-achieving code in the code class of interest is extracted. The symmetry structure of the PIR problem is then analyzed, which allows us to construct symmetric codes from asymmetric ones, yielding a meaningful bridge between the proposed code and existing ones in the literature.
ITJun 14, 2018
Private Information DeliveryHua Sun
We introduce the problem of private information delivery (PID), comprised of $K$ messages, a user, and $N$ servers (each holds $M\leq K$ messages) that wish to deliver one out of $K$ messages to the user privately, i.e., without revealing the delivered message index to the user. The information theoretic capacity of PID, $C$, is defined as the maximum number of bits of the desired message that can be privately delivered per bit of total communication to the user. For the PID problem with $K$ messages, $N$ servers, $M$ messages stored per server, and $N \geq \lceil \frac{K}{M} \rceil$, we provide an achievable scheme of rate $1/\lceil \frac{K}{M} \rceil$ and an information theoretic converse of rate $M/K$, i.e., the PID capacity satisfies $1/\lceil \frac{K}{M} \rceil \leq C \leq M/K$. This settles the capacity of PID when $\frac{K}{M}$ is an integer. When $\frac{K}{M}$ is not an integer, we show that the converse rate of $M/K$ is achievable if $N \geq \frac{K}{\gcd(K,M)} - (\frac{M}{\gcd(K,M)}-1)(\lfloor \frac{K}{M} \rfloor -1)$, and the achievable rate of $1/\lceil \frac{K}{M} \rceil$ is optimal if $N = \lceil \frac{K}{M} \rceil$. Otherwise if $\lceil \frac{K}{M} \rceil < N < \frac{K}{\gcd(K,M)} - (\frac{M}{\gcd(K,M)}-1)(\lfloor \frac{K}{M} \rfloor -1)$, we give an improved achievable scheme and prove its optimality for several small settings.
ITApr 26, 2018
The Capacity of Private Information Retrieval with EavesdroppersQiwen Wang, Hua Sun, Mikael Skoglund
We consider the problem of private information retrieval (PIR) with colluding servers and eavesdroppers (abbreviated as ETPIR). The ETPIR problem is comprised of $K$ messages, $N$ servers where each server stores all $K$ messages, a user who wants to retrieve one of the $K$ messages without revealing the desired message index to any set of $T$ colluding servers, and an eavesdropper who can listen to the queries and answers of any $E$ servers but is prevented from learning any information about the messages. The information theoretic capacity of ETPIR is defined to be the maximum number of desired message symbols retrieved privately per information symbol downloaded. We show that the capacity of ETPIR is $C = \left( 1- \frac{E}{N} \right) \left(1 + \frac{T-E}{N-E} + \cdots + \left( \frac{T-E}{N-E} \right)^{K-1} \right)^{-1}$ when $E < T$, and $C = \left( 1 - \frac{E}{N} \right)$ when $E \geq T$. To achieve the capacity, the servers need to share a common random variable (independent of the messages), and its size must be at least $\frac{E}{N} \cdot \frac{1}{C}$ symbols per message symbol. Otherwise, with less amount of shared common randomness, ETPIR is not feasible and the capacity reduces to zero. An interesting observation is that the ETPIR capacity expression takes different forms in two regimes. When $E < T$, the capacity equals the inverse of a sum of a geometric series with $K$ terms and decreases with $K$; this form is typical for capacity expressions of PIR. When $E \geq T$, the capacity does not depend on $K$, a typical form for capacity expressions of SPIR (symmetric PIR, which further requires data-privacy, {\it i.e.,} the user learns no information about other undesired messages); the capacity does not depend on $T$ either. In addition, the ETPIR capacity result includes multiple previous PIR and SPIR capacity results as special cases.
ITJan 26, 2017
Private Information Retrieval from MDS Coded Data with Colluding Servers: Settling a Conjecture by Freij-Hollanti et al.Hua Sun, Syed A. Jafar
A $(K, N, T, K_c)$ instance of the MDS-TPIR problem is comprised of $K$ messages and $N$ distributed servers. Each message is separately encoded through a $(K_c, N)$ MDS storage code. A user wishes to retrieve one message, as efficiently as possible, while revealing no information about the desired message index to any colluding set of up to $T$ servers. The fundamental limit on the efficiency of retrieval, i.e., the capacity of MDS-TPIR is known only at the extremes where either $T$ or $K_c$ belongs to $\{1,N\}$. The focus of this work is a recent conjecture by Freij-Hollanti, Gnilke, Hollanti and Karpuk which offers a general capacity expression for MDS-TPIR. We prove that the conjecture is false by presenting as a counterexample a PIR scheme for the setting $(K, N, T, K_c) = (2,4,2,2)$, which achieves the rate $3/5$, exceeding the conjectured capacity, $4/7$. Insights from the counterexample lead us to capacity characterizations for various instances of MDS-TPIR including all cases with $(K, N, T, K_c) = (2,N,T,N-1)$, where $N$ and $T$ can be arbitrary.
ITNov 7, 2016
Multiround Private Information Retrieval: Capacity and Storage OverheadHua Sun, Syed A. Jafar
The capacity has recently been characterized for the private information retrieval (PIR) problem as well as several of its variants. In every case it is assumed that all the queries are generated by the user simultaneously. Here we consider multiround PIR, where the queries in each round are allowed to depend on the answers received in previous rounds. We show that the capacity of multiround PIR is the same as the capacity of single-round PIR (the result is generalized to also include $T$-privacy constraints). Combined with previous results, this shows that there is no capacity advantage from multiround over single-round schemes, non-linear over linear schemes or from $ε$-error over zero-error schemes. However, we show through an example that there is an advantage in terms of storage overhead. We provide an example of a multiround, non-linear, $ε$-error PIR scheme that requires a strictly smaller storage overhead than the best possible with single-round, linear, zero-error PIR schemes.
ITOct 10, 2016
Optimal Download Cost of Private Information Retrieval for Arbitrary Message LengthHua Sun, Syed A. Jafar
A private information retrieval scheme is a mechanism that allows a user to retrieve any one out of $K$ messages from $N$ non-communicating replicated databases, each of which stores all $K$ messages, without revealing anything about the identity of the desired message index to any individual database. If the size of each message is $L$ bits and the total download required by a PIR scheme from all $N$ databases is $D$ bits, then $D$ is called the download cost and the ratio $L/D$ is called an achievable rate. For fixed $K,N\in\mathbb{N}$, the capacity of PIR, denoted by $C$, is the supremum of achievable rates over all PIR schemes and over all message sizes, and was recently shown to be $C=(1+1/N+1/N^2+\cdots+1/N^{K-1})^{-1}$. In this work, for arbitrary $K, N$, we explore the minimum download cost $D_L$ across all PIR schemes (not restricted to linear schemes) for arbitrary message lengths $L$ under arbitrary choices of alphabet (not restricted to finite fields) for the message and download symbols. If the same $M$-ary alphabet is used for the message and download symbols, then we show that the optimal download cost in $M$-ary symbols is $D_L=\lceil\frac{L}{C}\rceil$. If the message symbols are in $M$-ary alphabet and the downloaded symbols are in $M'$-ary alphabet, then we show that the optimal download cost in $M'$-ary symbols, $D_L\in\left\{\left\lceil \frac{L'}{C}\right\rceil,\left\lceil \frac{L'}{C}\right\rceil-1,\left\lceil \frac{L'}{C}\right\rceil-2\right\}$, where $L'= \lceil L \log_{M'} M\rceil$.
ITJun 28, 2016
The Capacity of Symmetric Private Information RetrievalHua Sun, Syed A. Jafar
Private information retrieval (PIR) is the problem of retrieving as efficiently as possible, one out of $K$ messages from $N$ non-communicating replicated databases (each holds all $K$ messages) while keeping the identity of the desired message index a secret from each individual database. Symmetric PIR (SPIR) is a generalization of PIR to include the requirement that beyond the desired message, the user learns nothing about the other $K-1$ messages. The information theoretic capacity of SPIR (equivalently, the reciprocal of minimum download cost) is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. We show that the capacity of SPIR is $1-1/N$ regardless of the number of messages $K$, if the databases have access to common randomness (not available to the user) that is independent of the messages, in the amount that is at least $1/(N-1)$ bits per desired message bit, and zero otherwise. Extensions to the capacity region of SPIR and the capacity of finite length SPIR are provided.
ITMay 2, 2016
The Capacity of Robust Private Information Retrieval with Colluding DatabasesHua Sun, Syed A. Jafar
Private information retrieval (PIR) is the problem of retrieving as efficiently as possible, one out of $K$ messages from $N$ non-communicating replicated databases (each holds all $K$ messages) while keeping the identity of the desired message index a secret from each individual database. The information theoretic capacity of PIR (equivalently, the reciprocal of minimum download cost) is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. $T$-private PIR is a generalization of PIR to include the requirement that even if any $T$ of the $N$ databases collude, the identity of the retrieved message remains completely unknown to them. Robust PIR is another generalization that refers to the scenario where we have $M \geq N$ databases, out of which any $M - N$ may fail to respond. For $K$ messages and $M\geq N$ databases out of which at least some $N$ must respond, we show that the capacity of $T$-private and Robust PIR is $\left(1+T/N+T^2/N^2+\cdots+T^{K-1}/N^{K-1}\right)^{-1}$. The result includes as special cases the capacity of PIR without robustness ($M=N$) or $T$-privacy constraints ($T=1$).
ITFeb 29, 2016
The Capacity of Private Information RetrievalHua Sun, Syed A. Jafar
In the private information retrieval (PIR) problem a user wishes to retrieve, as efficiently as possible, one out of $K$ messages from $N$ non-communicating databases (each holds all $K$ messages) while revealing nothing about the identity of the desired message index to any individual database. The information theoretic capacity of PIR is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. For $K$ messages and $N$ databases, we show that the PIR capacity is $(1+1/N+1/N^2+\cdots+1/N^{K-1})^{-1}$. A remarkable feature of the capacity achieving scheme is that if we eliminate any subset of messages (by setting the message symbols to zero), the resulting scheme also achieves the PIR capacity for the remaining subset of messages.
ITJan 28, 2016
Blind Interference Alignment for Private Information RetrievalHua Sun, Syed A. Jafar
Blind interference alignment (BIA) refers to interference alignment schemes that are designed only based on channel coherence pattern knowledge at the transmitters (the "blind" transmitters do not know the exact channel values). Private information retrieval (PIR) refers to the problem where a user retrieves one out of K messages from N non-communicating databases (each holds all K messages) without revealing anything about the identity of the desired message index to any individual database. In this paper, we identify an intriguing connection between PIR and BIA. Inspired by this connection, we characterize the information theoretic optimal download cost of PIR, when we have K = 2 messages and the number of databases, N, is arbitrary.