34.8ARMay 25
Architectural Limits of Cloud TPUs in Finite-Field CryptographyHung Dang, Xuan Phu Dang, Tue Nguyen
We empirically characterise the cost-efficiency deficit between cloud Tensor Processing Units and GPUs for finite-field cryptography. Against A100 GPU baselines (cuZK), we measure a $[5{,}558\times, 6{,}908\times]$ deficit across v5p and v4 architectures under an FP32-mantissa staging discipline, and a $\sim$$4{,}693\times$ deficit using v5p's native \texttt{int32} accumulator. We analytically project this deficit into a fundamental arithmetic penalty (lacking wide-integer ALUs) and a spatial penalty. We demonstrate that evaluating concurrent multi-tenant deployments, where strict separation forces eager Montgomery reduction, yields a projected $5.19\times$ spatial collapse; relaxing this constraint theoretically recovers these spatial cycles, yet the underlying arithmetic penalty remains. To facilitate this characterisation, we deploy \codename as a measurement vehicle. By mapping low-degree polynomials onto matrix-form Number Theoretic Transforms, the scheduler stacks heterogeneous polynomials into dense 2D matrices, achieving $\sim$$100\%$ K-dimension column occupancy on uniform workloads ($>$$92\%$ on mixed-degree traces). However, despite optimal K-dimension packing, severe M-dimension under-utilisation (e.g., $6.25\%$ on v4) combined with overwhelming VPU-bound Montgomery reduction stalls mathematically starve the systolic arrays. A post-hoc HLO validator ensures these measurements remain structurally isolated against the XLA fusion engine. Our findings empirically demonstrate the structural inadequacy of AI-optimised systolic arrays for exact, high-throughput field arithmetic.
27.8CRApr 21
CHRONOS: A Hardware-Assisted Phase-Decoupled Framework for Secure Federated Learning in IoTHung Dang
We propose CHRONOS, a hardware-assisted framework that decouples the cryptographic setup required for private gradient aggregation from the active training phase. CHRONOS executes a once-per-epoch server-relayed Diffie-Hellman key exchange during a device's idle window. It generates ephemeral keypairs and derives PRG keys entirely within an ARM TrustZone enclave, ensuring private keys never exist in Normal World memory. Pairwise secrets are sealed in the enclave, and Shamir secret shares of the ephemeral private key are distributed to peers. During training, clients mask gradients with a single stream-cipher evaluation and transmit them in one communication round. A hardware-backed round counter enforces single-use freshness. If clients drop out mid-round, the server reconstructs their masks from peer-held Shamir shares, preserving correct aggregation without repeating the round. Evaluation on Rock Pi 4 devices using OP-TEE demonstrates that CHRONOS achieves OS-level compromise resistance and thwarts state-of-the-art gradient inversion attacks. It reduces active-phase aggregation latency by up to 74% compared to synchronous secure aggregation for 20 clients. The system maintains a persistent Secure World storage footprint of fewer than 700 bytes per device, scaling independently of model dimension.
11.4CRMay 10
Enforcing Attestable Workflows across Untrusted NetworksHung Dang, Tue Nguyen
Confidential high-performance computing orchestrates workloads across federated domains, yet existing frameworks rely on high-overhead user-space library operating systems or assume single-host execution. We propose \codename, an architecture federating Trusted Execution Environments via a split Trusted Computing Base (TCB) design. It couples a hardware-isolated Control Plane executing Mutually Attested Key Exchange (\make) with a measured guest-resident extended Berkeley Packet Filter (eBPF) Data Plane. By anchoring cryptographic key release to hardware measurements and executing enforcement in the kernel, \codename\ achieves native-speed encrypted routing. Empirical evaluation demonstrates a steady-state enforcement cost of $6\,μ$s per packet, imposing a $13$--$15\,μ$s absolute latency overhead. On distributed pipelines, \codename\ incurs just a $6.1\%$ execution penalty over plaintext baselines, bypassing the $62\%$ penalty of user-space counterparts. The system initializes a 100-node cluster in under 1.5 seconds, providing an efficient confidential interconnect for long-running workflows.
65.3CRApr 29
Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI AgentsHung Dang
Structured-workflow agents driven by large language models execute tool calls against sensitive external environments. We propose \codename, a telemetry-driven behavioral anomaly detection firewall. Drawing on sequence-based intrusion detection, \codename\ compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA). The model defines permitted tool sequences, sequential contexts, and parameter bounds. At runtime, a lightweight gateway enforces these boundaries via an $O(1)$ state-transition structural lookup, shifting computationally expensive analysis entirely offline. Evaluated on the Agent Security Bench (ASB), \codename\ achieves a 5.6\% macro-averaged attack success rate (ASR) across five scenarios. Within three structured workflows, ASR drops to 2.2\%, outperforming Aegis, a state-of-the-art stateless scanner, at 12.8\%. \codename\ achieves 0\% ASR on multi-step and context-sequential attacks in structured settings. Furthermore, against 1,000 algorithmically spliced exfiltration payloads, only 1.4\% matched valid structural paths, all of which failed end-to-end string parameter guards (0 successes out of 14 surviving paths, 95\% CI [0\%, 23.2\%]). \codename\ introduces just 2.2~ms of per-call latency (a 3.7$\times$ speedup over \textsc{Aegis}) while maintaining a 2.0\% benign task failure rate (BTFR) on benign workloads. Modeling the behavioral trajectory effectively collapses the available attack surface, but unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18\% evasion rate). Thus, exact-match whitelisting of sensitive parameters ultimately bears the final defensive load against execution.
CRNov 21, 2019
Self-Expiring Data Capsule using Trusted Execution EnvironmentHung Dang, Ee-Chien Chang
Data privacy is unarguably of extreme importance. Nonetheless, there exist various daunting challenges to safe-guarding data privacy. These challenges stem from the fact that data owners have little control over their data once it has transgressed their local storage and been managed by third parties whose trustworthiness is questionable at times. Our work seeks to enhance data privacy by constructing a self-expiring data capsule. Sensitive data is encapsulated into a capsule which is associated with an access policy an expiring condition. The former indicates eligibility of functions that can access the data, and the latter dictates when the data should become inaccessible to anyone, including the previously eligible functions. Access to the data capsule, as well as its dismantling once the expiring condition is met, are governed by a committee of independent and mutually distrusting nodes. The pivotal contribution of our work is an integration of hardware primitive, state machine replication and threshold secret sharing in the design of the self-expiring data encapsulation framework. We implement the proposed framework in a system called TEEKAP. Our empirical experiments conducted on a realistic deployment setting with the access control committee spanning across four geographical regions reveal that TEEKAP can process access requests at scale with sub-second latency.
CRJun 14, 2019
Effectiveness of Distillation Attack and Countermeasure on Neural Network WatermarkingZiqi Yang, Hung Dang, Ee-Chien Chang
The rise of machine learning as a service and model sharing platforms has raised the need of traitor-tracing the models and proof of authorship. Watermarking technique is the main component of existing methods for protecting copyright of models. In this paper, we show that distillation, a widely used transformation technique, is a quite effective attack to remove watermark embedded by existing algorithms. The fragility is due to the fact that distillation does not retain the watermark embedded in the model that is redundant and independent to the main learning task. We design ingrain in response to the destructive distillation. It regularizes a neural network with an ingrainer model, which contains the watermark, and forces the model to also represent the knowledge of the ingrainer. Our extensive evaluations show that ingrain is more robust to distillation attack and its robustness against other widely used transformation techniques is comparable to existing methods.
DCMay 15, 2019
Autonomous Membership Service for Enclave ApplicationsHung Dang, Ee-Chien Chang
Trusted Execution Environment, or enclave, promises to protect data confidentiality and execution integrity of an outsourced computation on an untrusted host. Extending the protection to distributed applications that run on physically separated hosts, however, remains non-trivial. For instance, the current enclave provisioning model hinders elasticity of cloud applications. Furthermore, it remains unclear how an enclave process could verify if there exists another concurrently running enclave process instantiated using the same codebase, or count a number of such processes. In this paper, we seek an autonomous membership service for enclave applications. The application owner only needs to partake in instantiating the very first process of the application, whereas all subsequent process commission and decommission will be administered by existing and active processes of that very application. To achieve both safety and liveness, our protocol design admits unjust excommunication of a non-faulty process from the membership group. We implement the proposed membership service in a system called AMES. Our experimental study shows that AMES incurs an overhead of 5% - 16% compared to vanilla enclave execution.
CRAug 29, 2018
Fair Marketplace for Secure Outsourced ComputationsHung Dang, Dat Le Tien, Ee-Chien Chang
The cloud computing paradigm offers clients ubiquitous and on demand access to a shared pool of computing resources, enabling the clients to provision scalable services with minimal management effort. Such a pool of resources, however, is typically owned and controlled by a single service provider, making it a single-point-of-failure. This paper presents Kosto - a framework that provisions a fair marketplace for secure outsourced computations, wherein the pool of computing resources aggregates resources offered by a large cohort of independent compute nodes. Kosto protects the confidentiality of clients' inputs as well as the integrity of the outsourced computations and their results using trusted hardware's enclave execution, in particular Intel SGX. Furthermore, Kosto warrants fair exchanges between the clients' payments for the execution of an outsourced computations and the compute nodes' work in servicing the clients' requests. Empirical evaluation on the prototype implementation of Kosto shows that performance overhead incurred by enclave execution is as small as 3% for computation-intensive operations, and 1.5x for IO-intensive operations.
DCApr 2, 2018
Towards Scaling Blockchain Systems via ShardingHung Dang, Tien Tuan Anh Dinh, Dumitrel Loghin et al.
Existing blockchain systems scale poorly because of their distributed consensus protocols. Current attempts at improving blockchain scalability are limited to cryptocurrency. Scaling blockchain systems under general workloads (i.e., non-cryptocurrency applications) remains an open question. In this work, we take a principled approach to apply sharding, which is a well-studied and proven technique to scale out databases, to blockchain systems in order to improve their transaction throughput at scale. This is challenging, however, due to the fundamental difference in failure models between databases and blockchain. To achieve our goal, we first enhance the performance of Byzantine consensus protocols, by doing so we improve individual shards' throughput. Next, we design an efficient shard formation protocol that leverages a trusted random beacon to securely assign nodes into shards. We rely on trusted hardware, namely Intel SGX, to achieve high performance for both consensus and shard formation protocol. Third, we design a general distributed transaction protocol that ensures safety and liveness even when transaction coordinators are malicious. Finally, we conduct an extensive evaluation of our design both on a local cluster and on Google Cloud Platform. The results show that our consensus and shard formation protocols outperform state-of-the-art solutions at scale. More importantly, our sharded blockchain reaches a high throughput that can handle Visa-level workloads, and is the largest ever reported in a realistic environment.
LGFeb 13, 2018
Flipped-Adversarial AutoEncodersJiyi Zhang, Hung Dang, Hwee Kuan Lee et al.
We propose a flipped-Adversarial AutoEncoder (FAAE) that simultaneously trains a generative model G that maps an arbitrary latent code distribution to a data distribution and an encoder E that embodies an "inverse mapping" that encodes a data sample into a latent code vector. Unlike previous hybrid approaches that leverage adversarial training criterion in constructing autoencoders, FAAE minimizes re-encoding errors in the latent space and exploits adversarial criterion in the data space. Experimental evaluations demonstrate that the proposed framework produces sharper reconstructed images while at the same time enabling inference that captures rich semantic representation of data.
CRMay 22, 2017
Evading Classifiers by Morphing in the DarkHung Dang, Yue Huang, Ee-Chien Chang
Learning-based systems have been shown to be vulnerable to evasion through adversarial data manipulation. These attacks have been studied under assumptions that the adversary has certain knowledge of either the target model internals, its training dataset or at least classification scores it assigns to input samples. In this paper, we investigate a much more constrained and realistic attack scenario wherein the target classifier is minimally exposed to the adversary, revealing on its final classification decision (e.g., reject or accept an input sample). Moreover, the adversary can only manipulate malicious samples using a blackbox morpher. That is, the adversary has to evade the target classifier by morphing malicious samples "in the dark". We present a scoring mechanism that can assign a real-value score which reflects evasion progress to each sample based on the limited information available. Leveraging on such scoring mechanism, we propose an evasion method -- EvadeHC -- and evaluate it against two PDF malware detectors, namely PDFRate and Hidost. The experimental evaluation demonstrates that the proposed evasion attacks are effective, attaining $100\%$ evasion rate on the evaluation dataset. Interestingly, EvadeHC outperforms the known classifier evasion technique that operates based on classification scores output by the classifiers. Although our evaluations are conducted on PDF malware classifier, the proposed approaches are domain-agnostic and is of wider application to other learning-based systems.