Bin Qi

h-index6
2papers

2 Papers

ARDec 25, 2024Code
Tempus Core: Area-Power Efficient Temporal-Unary Convolution Core for Low-Precision Edge DLAs

Prabhu Vellaisamy, Harideep Nair, Thomas Kang et al.

The increasing complexity of deep neural networks (DNNs) poses significant challenges for edge inference deployment due to resource and power constraints of edge devices. Recent works on unary-based matrix multiplication hardware aim to leverage data sparsity and low-precision values to enhance hardware efficiency. However, the adoption and integration of such unary hardware into commercial deep learning accelerators (DLA) remain limited due to processing element (PE) array dataflow differences. This work presents Tempus Core, a convolution core with highly scalable unary-based PE array comprising of tub (temporal-unary-binary) multipliers that seamlessly integrates with the NVDLA (NVIDIA's open-source DLA for accelerating CNNs) while maintaining dataflow compliance and boosting hardware efficiency. Analysis across various datapath granularities shows that for INT8 precision in 45nm CMOS, Tempus Core's PE cell unit (PCU) yields 59.3% and 15.3% reductions in area and power consumption, respectively, over NVDLA's CMAC unit. Considering a 16x16 PE array in Tempus Core, area and power improves by 75% and 62%, respectively, while delivering 5x and 4x iso-area throughput improvements for INT8 and INT4 precisions. Post-place and route analysis of Tempus Core's PCU shows that the 16x4 PE array for INT4 precision in 45nm CMOS requires only 0.017 mm^2 die area and consumes only 6.2mW of total power. We demonstrate that area-power efficient unary-based hardware can be seamlessly integrated into conventional DLAs, paving the path for efficient unary hardware for edge AI inference.

CRMay 3, 2021
Three-Party Integer Comparison and Applications

Jie Ma, Bin Qi, Kewei Lv

Secure integer comparison has been a popular research topic in cryptography, both for its simplicity to describe and for its applications. The aim is to enable two parties to compare their inputs without revealing the exact value of those inputs. In this paper, we highlight three-party integer comparison (TPIC), where a \emph{judge}, with no private input, wants to know the comparison result, while two \emph{competitors} hold secret integers to do privacy-preserving comparison. The judge actively obtains the result rather than passively waiting for it sent by a competitor. We give two TPIC constructions considering \emph{Mixed adversaries}, who have with different capabilities. One is secure against a semi-honest adversary with low computation and communication cost, while the other is secure against a malicious adversary. Basing on TPIC, we present multi-party comparisons through concrete applications, including a joint bidding scheme and a practical auction. Brief security proofs and analysis for the applications are presented. In comparison, our auction scheme is more efficient with lower cost, making it feasible in practice rather than a theoretical design. All the comparisons and application schemes run on top of blockchain requiring a constant number of rounds.