CRApr 28
ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic SteganographyYaofei Wang, Rui Wang, Weilong Pang et al.
Generative linguistic steganography (GLS) enables covert communication by embedding secret messages into the natural language generation process. In practical deployment, however, GLS is vulnerable to tokenization ambiguity: the same surface text may be re-tokenized into a different token sequence at the receiver, breaking the shared decoding state between the communicating parties so that a single local mismatch can propagate into complete extraction failure. Existing solutions either remove ambiguous tokens -- distorting the generation distribution and compromising security -- or preserve the distribution at the cost of substantially reduced embedding capacity or prohibitive runtime overhead. To address this issue, we propose ReTokSync (Re-Tokenization Synchronization), a self-synchronizing disambiguation framework that monitors the receiver-view tokenization during generation and triggers a corrective reset only when ambiguity actually occurs. By confining the effect of tokenization ambiguity to sparse residual bit errors rather than global desynchronization, ReTokSync leaves ambiguity-free positions entirely untouched and remains compatible with the underlying steganographic algorithm. Experiments on both English and Chinese settings show that ReTokSync stays closest to the steganographic baseline in distributional security (zero KL divergence), text quality, embedding capacity, and runtime, while achieving extraction accuracy above 99.7\%. Building on this property, we further develop a two-channel covert communication mechanism in which ReTokSync serves as the primary channel and a reliable auxiliary channel corrects the remaining errors, achieving 100\% end-to-end recovery across all evaluated configurations.
LGAug 1, 2022
DeFL: Decentralized Weight Aggregation for Cross-silo Federated LearningJialiang Han, Yudong Han, Gang Huang et al.
Federated learning (FL) is an emerging promising paradigm of privacy-preserving machine learning (ML). An important type of FL is cross-silo FL, which enables a small scale of organizations to cooperatively train a shared model by keeping confidential data locally and aggregating weights on a central parameter server. However, the central server may be vulnerable to malicious attacks or software failures in practice. To address this issue, in this paper, we propose DeFL, a novel decentralized weight aggregation framework for cross-silo FL. DeFL eliminates the central server by aggregating weights on each participating node and weights of only the current training round are maintained and synchronized among all nodes. We use Multi-Krum to enable aggregating correct weights from honest nodes and use HotStuff to ensure the consistency of the training round number and weights among all nodes. Besides, we theoretically analyze the Byzantine fault tolerance, convergence, and complexity of DeFL. We conduct extensive experiments over two widely-adopted public datasets, i.e. CIFAR-10 and Sentiment140, to evaluate the performance of DeFL. Results show that DeFL defends against common threat models with minimal accuracy loss, and achieves up to 100x reduction in storage overhead and up to 12x reduction in network overhead, compared to state-of-the-art decentralized FL approaches.
LGJan 14, 2022
Demystifying Swarm Learning: A New Paradigm of Blockchain-based Decentralized Federated LearningJialiang Han, Yun Ma, Yudong Han
Federated learning (FL) is an emerging promising privacy-preserving machine learning paradigm and has raised more and more attention from researchers and developers. FL keeps users' private data on devices and exchanges the gradients of local models to cooperatively train a shared Deep Learning (DL) model on central custodians. However, the security and fault tolerance of FL have been increasingly discussed, because its central custodian mechanism or star-shaped architecture can be vulnerable to malicious attacks or software failures. To address these problems, Swarm Learning (SL) introduces a permissioned blockchain to securely onboard members and dynamically elect the leader, which allows performing DL in an extremely decentralized manner. Compared with tremendous attention to SL, there are few empirical studies on SL or blockchain-based decentralized FL, which provide comprehensive knowledge of best practices and precautions of deploying SL in real-world scenarios. Therefore, we conduct the first comprehensive study of SL to date, to fill the knowledge gap between SL deployment and developers, as far as we are concerned. In this paper, we conduct various experiments on 3 public datasets of 5 research questions, present interesting findings, quantitatively analyze the reasons behind these findings, and provide developers and researchers with practical suggestions. The findings have evidenced that SL is supposed to be suitable for most application scenarios, no matter whether the dataset is balanced, polluted, or biased over irrelevant features.
IRJan 13, 2021
$C^3DRec$: Cloud-Client Cooperative Deep Learning for Temporal Recommendation in the Post-GDPR EraJialiang Han, Yun Ma
Mobile devices enable users to retrieve information at any time and any place. Considering the occasional requirements and fragmentation usage pattern of mobile users, temporal recommendation techniques are proposed to improve the efficiency of information retrieval on mobile devices by means of accurately recommending items via learning temporal interests with short-term user interaction behaviors. However, the enforcement of privacy-preserving laws and regulations, such as GDPR, may overshadow the successful practice of temporal recommendation. The reason is that state-of-the-art recommendation systems require to gather and process the user data in centralized servers but the interaction behaviors data used for temporal recommendation are usually non-transactional data that are not allowed to gather without the explicit permission of users according to GDPR. As a result, if users do not permit services to gather their interaction behaviors data, the temporal recommendation fails to work. To realize the temporal recommendation in the post-GDPR era, this paper proposes $C^3DRec$, a cloud-client cooperative deep learning framework of mining interaction behaviors for recommendation while preserving user privacy. $C^3DRec$ constructs a global recommendation model on centralized servers using data collected before GDPR and fine-tunes the model directly on individual local devices using data collected after GDPR. We design two modes to accomplish the recommendation, i.e. pull mode where candidate items are pulled down onto the devices and fed into the local model to get recommended items, and push mode where the output of the local model is pushed onto the server and combined with candidate items to get recommended ones. Evaluation results show that $C^3DRec$ achieves comparable recommendation accuracy to the centralized approaches, with minimal privacy concern.