Qiong Li

CV
h-index25
15papers
187citations
Novelty49%
AI Score32

15 Papers

ARFeb 3, 2023Code
PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications

Qiong Li, Chao Fang, Zhongfeng Wang

Posit has been a promising alternative to the IEEE-754 floating point format for deep learning applications due to its better trade-off between dynamic range and accuracy. However, hardware implementation of posit arithmetic requires further exploration, especially for the dot-product operations dominated in deep neural networks (DNNs). It has been implemented by either the combination of multipliers and an adder tree or cascaded fused multiply-add units, leading to poor computational efficiency and excessive hardware overhead. To address this issue, we propose an open-source posit dot-product unit, namely PDPU, that facilitates resource-efficient and high-throughput dot-product hardware implementation. PDPU not only features the fused and mixed-precision architecture that eliminates redundant latency and hardware resources, but also has a fine-grained 6-stage pipeline, improving computational efficiency. A configurable PDPU generator is further developed to meet the diverse needs of various DNNs for computational accuracy. Experimental results evaluated under the 28nm CMOS process show that PDPU reduces area, latency, and power by up to 43%, 64%, and 70%, respectively, compared to the existing implementations. Hence, PDPU has great potential as the computing core of posit-based accelerators for deep learning applications.

ARSep 15, 2023
A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

Longwei Huang, Chao Fang, Qiong Li et al.

Extreme edge platforms, such as in-vehicle smart devices, require efficient deployment of quantized deep neural networks (DNNs) to enable intelligent applications with limited amounts of energy, memory, and computing resources. However, many edge devices struggle to boost inference throughput of various quantized DNNs due to the varying quantization levels, and these devices lack floating-point (FP) support for on-device learning, which prevents them from improving model accuracy while ensuring data privacy. To tackle the challenges above, we propose a precision-scalable RISC-V DNN processor with on-device learning capability. It facilitates diverse precision levels of fixed-point DNN inference, spanning from 2-bit to 16-bit, and enhances on-device learning through improved support with FP16 operations. Moreover, we employ multiple methods such as FP16 multiplier reuse and multi-precision integer multiplier reuse, along with balanced mapping of FPGA resources, to significantly improve hardware resource utilization. Experimental results on the Xilinx ZCU102 FPGA show that our processor significantly improves inference throughput by 1.6$\sim$14.6$\times$ and energy efficiency by 1.1$\sim$14.6$\times$ across various DNNs, compared to the prior art, XpulpNN. Additionally, our processor achieves a 16.5$\times$ higher FP throughput for on-device learning.

CVSep 26, 2023
Unidirectional brain-computer interface: Artificial neural network encoding natural images to fMRI response in the visual cortex

Ruixing Liang, Xiangyu Zhang, Qiong Li et al.

While significant advancements in artificial intelligence (AI) have catalyzed progress across various domains, its full potential in understanding visual perception remains underexplored. We propose an artificial neural network dubbed VISION, an acronym for "Visual Interface System for Imaging Output of Neural activity," to mimic the human brain and show how it can foster neuroscientific inquiries. Using visual and contextual inputs, this multimodal model predicts the brain's functional magnetic resonance imaging (fMRI) scan response to natural images. VISION successfully predicts human hemodynamic responses as fMRI voxel values to visual inputs with an accuracy exceeding state-of-the-art performance by 45%. We further probe the trained networks to reveal representational biases in different visual areas, generate experimentally testable hypotheses, and formulate an interpretable metric to associate these hypotheses with cortical functions. With both a model and evaluation metric, the cost and time burdens associated with designing and implementing functional analysis on the visual cortex could be reduced. Our work suggests that the evolution of computational models may shed light on our fundamental understanding of the visual cortex and provide a viable approach toward reliable brain-machine interfaces.

LGAug 25, 2023
DAG-ACFL: Asynchronous Clustered Federated Learning based on DAG-DLT

Xiaofeng Xue, Haokun Mao, Qiong Li

Federated learning (FL) aims to collaboratively train a global model while ensuring client data privacy. However, FL faces challenges from the non-IID data distribution among clients. Clustered FL (CFL) has emerged as a promising solution, but most existing CFL frameworks adopt synchronous frameworks lacking asynchrony. An asynchronous CFL framework called SDAGFL based on directed acyclic graph distributed ledger techniques (DAG-DLT) was proposed, but its complete decentralization leads to high communication and storage costs. We propose DAG-ACFL, an asynchronous clustered FL framework based on directed acyclic graph distributed ledger techniques (DAG-DLT). We first detail the components of DAG-ACFL. A tip selection algorithm based on the cosine similarity of model parameters is then designed to aggregate models from clients with similar distributions. An adaptive tip selection algorithm leveraging change-point detection dynamically determines the number of selected tips. We evaluate the clustering and training performance of DAG-ACFL on multiple datasets and analyze its communication and storage costs. Experiments show the superiority of DAG-ACFL in asynchronous clustered FL. By combining DAG-DLT with clustered FL, DAG-ACFL realizes robust, decentralized and private model training with efficient performance.

CVOct 10, 2023
Towards More Efficient Depression Risk Recognition via Gait

Min Ren, Muchan Tao, Xuecai Hu et al.

Depression, a highly prevalent mental illness, affects over 280 million individuals worldwide. Early detection and timely intervention are crucial for promoting remission, preventing relapse, and alleviating the emotional and financial burdens associated with depression. However, patients with depression often go undiagnosed in the primary care setting. Unlike many physiological illnesses, depression lacks objective indicators for recognizing depression risk, and existing methods for depression risk recognition are time-consuming and often encounter a shortage of trained medical professionals. The correlation between gait and depression risk has been empirically established. Gait can serve as a promising objective biomarker, offering the advantage of efficient and convenient data collection. However, current methods for recognizing depression risk based on gait have only been validated on small, private datasets, lacking large-scale publicly available datasets for research purposes. Additionally, these methods are primarily limited to hand-crafted approaches. Gait is a complex form of motion, and hand-crafted gait features often only capture a fraction of the intricate associations between gait and depression risk. Therefore, this study first constructs a large-scale gait database, encompassing over 1,200 individuals, 40,000 gait sequences, and covering six perspectives and three types of attire. Two commonly used psychological scales are provided as depression risk annotations. Subsequently, a deep learning-based depression risk recognition model is proposed, overcoming the limitations of hand-crafted approaches. Through experiments conducted on the constructed large-scale database, the effectiveness of the proposed method is validated, and numerous instructive insights are presented in the paper, highlighting the significant potential of gait-based depression risk recognition.

LGSep 26, 2022
An Energy Optimized Specializing DAG Federated Learning based on Event Triggered Communication

Xiaofeng Xue, Haokun Mao, Qiong Li et al.

Specializing Directed Acyclic Graph Federated Learning(SDAGFL) is a new federated learning framework which updates model from the devices with similar data distribution through Directed Acyclic Graph Distributed Ledger Technology (DAG-DLT). SDAGFL has the advantage of personalization, resisting single point of failure and poisoning attack in fully decentralized federated learning. Because of these advantages, the SDAGFL is suitable for the federated learning in IoT scenario where the device is usually battery-powered. To promote the application of SDAGFL in IoT, we propose an energy optimized SDAGFL based event-triggered communication mechanism, called ESDAGFL. In ESDAGFL, the new model is broadcasted only when it is significantly changed. We evaluate the ESDAGFL on a clustered synthetically FEMNIST dataset and a dataset from texts by Shakespeare and Goethe's works. The experiment results show that our approach can reduce energy consumption by 33\% compared with SDAGFL, and realize the same balance between training accuracy and specialization as SDAGFL.

ARMay 25, 2025
Enable Lightweight and Precision-Scalable Posit/IEEE-754 Arithmetic in RISC-V Cores for Transprecision Computing

Qiong Li, Chao Fang, Longwei Huang et al.

While posit format offers superior dynamic range and accuracy for transprecision computing, its adoption in RISC-V processors is hindered by the lack of a unified solution for lightweight, precision-scalable, and IEEE-754 arithmetic compatible hardware implementation. To address these challenges, we enhance RISC-V processors by 1) integrating dedicated posit codecs into the original FPU for lightweight implementation, 2) incorporating multi/mixed-precision support with dynamic exponent size for precision-scalability, and 3) reusing and customizing ISA extensions for IEEE-754 compatible posit operations. Our comprehensive evaluation spans the modified FPU, RISC-V core, and SoC levels. It demonstrates that our implementation achieves 47.9% LUTs and 57.4% FFs reduction compared to state-of-the-art posit-enabled RISC-V processors, while achieving up to 2.54$\times$ throughput improvement in various GEMM kernels.

CVJun 21, 2021
Knowledge Distillation via Instance-level Sequence Learning

Haoran Zhao, Xin Sun, Junyu Dong et al.

Recently, distillation approaches are suggested to extract general knowledge from a teacher network to guide a student network. Most of the existing methods transfer knowledge from the teacher network to the student via feeding the sequence of random mini-batches sampled uniformly from the data. Instead, we argue that the compact student network should be guided gradually using samples ordered in a meaningful sequence. Thus, it can bridge the gap of feature representation between the teacher and student network step by step. In this work, we provide a curriculum learning knowledge distillation framework via instance-level sequence learning. It employs the student network of the early epoch as a snapshot to create a curriculum for the student network's next training phase. We carry out extensive experiments on CIFAR-10, CIFAR-100, SVHN and CINIC-10 datasets. Compared with several state-of-the-art methods, our framework achieves the best performance with fewer iterations.

LGMar 12, 2021
Auction Based Clustered Federated Learning in Mobile Edge Computing System

Renhao Lu, Weizhe Zhang, Qiong Li et al.

In recent years, mobile clients' computing ability and storage capacity have greatly improved, efficiently dealing with some applications locally. Federated learning is a promising distributed machine learning solution that uses local computing and local data to train the Artificial Intelligence (AI) model. Combining local computing and federated learning can train a powerful AI model under the premise of ensuring local data privacy while making full use of mobile clients' resources. However, the heterogeneity of local data, that is, Non-independent and identical distribution (Non-IID) and imbalance of local data size, may bring a bottleneck hindering the application of federated learning in mobile edge computing (MEC) system. Inspired by this, we propose a cluster-based clients selection method that can generate a federated virtual dataset that satisfies the global distribution to offset the impact of data heterogeneity and proved that the proposed scheme could converge to an approximate optimal solution. Based on the clustering method, we propose an auction-based clients selection scheme within each cluster that fully considers the system's energy heterogeneity and gives the Nash equilibrium solution of the proposed scheme for balance the energy consumption and improving the convergence rate. The simulation results show that our proposed selection methods and auction-based federated learning can achieve better performance with the Convolutional Neural Network model (CNN) under different data distributions.

CVDec 13, 2019
Multi-level Similarity Learning for Low-Shot Recognition

Hongwei Xv, Xin Sun, Junyu Dong et al.

Low-shot learning indicates the ability to recognize unseen objects based on very limited labeled training samples, which simulates human visual intelligence. According to this concept, we propose a multi-level similarity model (MLSM) to capture the deep encoded distance metric between the support and query samples. Our approach is achieved based on the fact that the image similarity learning can be decomposed into image-level, global-level, and object-level. Once the similarity function is established, MLSM will be able to classify images for unseen classes by computing the similarity scores between a limited number of labeled samples and the target images. Furthermore, we conduct 5-way experiments with both 1-shot and 5-shot setting on Caltech-UCSD datasets. It is demonstrated that the proposed model can achieve promising results compared with the existing methods in practical applications.

QUANT-PHOct 10, 2019
High-speed Privacy Amplification Scheme using GMP in Quantum Key Distribution

Bingze Yan, Haokun Mao, Xiaofeng Xue et al.

Privacy amplification (PA) is the art of distilling a highly secret key from a partially secure string by public discussion. It is a vital procedure in quantum key distribution (QKD) to produce a theoretically unconditional secure key. The throughput of PA has become a bottleneck of the high-speed discrete variable QKD (DV-QKD) system. In this paper, a high-speed modular arithmetic hash PA scheme with GNU multiple precision (GMP) arithmetic library is presented. This scheme is implemented on two different central processing unit (CPU) platforms. The experimental results demon-strate that the throughput of this scheme achieves 260Mbps on the block size of 10^6 and 140Mbps on the block size of 10^8. This is the highest-speed recorded PA scheme on CPU platform to the author's knowledge.

QUANT-PHSep 27, 2019
Novel Reconciliation Protocol Based on Spinal Code for Continuous-variable Quantum Key Distribution

Xuan Wen, Qiong Li, Haokun Mao et al.

Reconciliation is a crucial procedure in post-processing of continuous variable quantum key distribution (CV-QKD) system, which is used to make two distant legitimate parties share identical corrected keys. The adaptive reconciliation is necessary and important for practical systems to cope with the variable channel. Many researchers adopt the punctured LDPC codes to implement adaptive reconciliation. In this paper, a novel rateless reconciliation protocol based on spinal code is proposed, which can achieve a high-efficiency and adaptive reconciliation in a larger range of SNRs. Due to the short codes length and simple tructure, our protocol is easy to implement without the complex codes designs of fixed rate codes, e.g., LDPC codes. Meanwhile, the structure of our protocol is highly parallel, which is suitable for hardware implementation, thus it also has the potential of high-speed hardware implementation. Besides, the security of proposed protocol is proved in theory. Experiment results show that the reconciliation efficiency maintains around 95% for ranging SNRs in a larger range (0,0.5), even exceeds 96.5% at extremely low SNR (<= 0.03) by using this novel scheme. The proposed protocol makes the long-distance CV-QKD systems much easier and stable to perform a high-performance and adaptive reconciliation.

QUANT-PHSep 5, 2019
Mathematical Model and Topology Evaluation of Quantum Secure Communication Network

Qiong Li, Yaxing Wang, Haokun Mao et al.

Due to the intrinsic point-to-point characteristic of quantum key distribution (QKD) systems, it is necessary to study and develop QKD network technology to provide a secure communication service for a large-scale of nodes over a large area. Considering the quality assurance required for such a network and the cost limitations, building an effective mathematical model of a QKD network becomes a critical task. In this paper, a flow-based mathematical model is proposed to describe a QKD network using mathematical concepts and language. In addition, an investigation on QKD network topology evaluation was conducted using a unique and novel QKD network performance indicator, the Information-Theoretic Secure communication bound, and the corresponding linear programming-based calculation algorithm. A large number of simulation results based on the topologies of SECOQC network and NSFNET network validate the effectiveness of the proposed model and indicator.

CVJul 23, 2019
Few-shot Learning for Domain-specific Fine-grained Image Classification

Xin Sun, Hongwei Xv, Junyu Dong et al.

Learning to recognize novel visual categories from a few examples is a challenging task for machines in real-world industrial applications. In contrast, humans have the ability to discriminate even similar objects with little supervision. This paper attempts to address the few shot fine-grained image classification problem. We propose a feature fusion model to explore discriminative features by focusing on key regions. The model utilizes the focus area location mechanism to discover the perceptually similar regions among objects. High-order integration is employed to capture the interaction information among intra-parts. We also design a Center Neighbor Loss to form robust embedding space distributions. Furthermore, we build a typical fine-grained and few-shot learning dataset miniPPlankton from the real-world application in the area of marine ecological environments. Extensive experiments are carried out to validate the performance of our method. The results demonstrate that our model achieves competitive performance compared with state-of-the-art models. Our work is a valuable complement to the model domain-specific industrial applications.

QUANT-PHApr 24, 2019
Modeling and Simulation of Practical Quantum Secure Communication Network

Yaxing Wang, Qiong Li, Qi Han et al.

As the Quantum Key Distribution (QKD) technology supporting the pointto-point application matures, the need to build the Quantum Secure Communication Network (QSCN) to guarantee the security of a large scale of nodes becomes urgent. Considering the project time and expense control, it is the first choice to build the QSCN based on an existing classical network. Suitable modeling and simulation are very important to construct a QSCN successfully and efficiently. In this paper, a practical QSCN model, which can reflect the network state well, is proposed. The model considers the volatile traffic demand of the classical network and the real key generation capability of the QKD devices, which can enhance the accuracy of simulation to a great extent. In addition, two unique QSCN performance indicators, ITS (information-theoretic secure) communication capability and ITS communication efficiency, are proposed in the model, which are necessary supplements for the evaluation of a QSCN except for those traditional performance indicators of classical networks. Finally, the accuracy of the proposed QSCN model and the necessity of the proposed performance indicators are verified by plentiful simulations results.