Dawen Xu

2papers

2 Papers

ARJul 7, 2021
R2F: A Remote Retraining Framework for AIoT Processors with Computing Errors

Dawen Xu, Meng He, Cheng Liu et al.

AIoT processors fabricated with newer technology nodes suffer rising soft errors due to the shrinking transistor sizes and lower power supply. Soft errors on the AIoT processors particularly the deep learning accelerators (DLAs) with massive computing may cause substantial computing errors. These computing errors are difficult to be captured by the conventional training on general purposed processors like CPUs and GPUs in a server. Applying the offline trained neural network models to the edge accelerators with errors directly may lead to considerable prediction accuracy loss. To address the problem, we propose a remote retraining framework (R2F) for remote AIoT processors with computing errors. It takes the remote AIoT processor with soft errors in the training loop such that the on-site computing errors can be learned with the application data on the server and the retrained models can be resilient to the soft errors. Meanwhile, we propose an optimized partial TMR strategy to enhance the retraining. According to our experiments, R2F enables elastic design trade-offs between the model accuracy and the performance penalty. The top-5 model accuracy can be improved by 1.93%-13.73% with 0%-200% performance penalty at high fault error rate. In addition, we notice that the retraining requires massive data transmission and even dominates the training time, and propose a sparse increment compression approach for the data transmission optimization, which reduces the retraining time by 38%-88% on average with negligible accuracy loss over a straightforward remote retraining.

LGJul 3, 2019
Accelerating Generative Neural Networks on Unmodified Deep Learning Processors -- A Software Approach

Dawen Xu, Ying Wang, Kaijie Tu et al.

Generative neural network is a new category of neural networks and it has been widely utilized in applications such as content generation, unsupervised learning, segmentation and pose estimation. It typically involves massive computing-intensive deconvolution operations that cannot be fitted to conventional neural network processors directly. However, prior works mainly investigated specialized hardware architectures through intensive hardware modifications to the existing deep learning processors to accelerate deconvolution together with the convolution. In contrast, this work proposes a novel deconvolution implementation with a software approach and enables fast and efficient deconvolution execution on the legacy deep learning processors. Our proposed method reorganizes the computation of deconvolution and allows the deep learning processors to treat it as the standard convolution by splitting the original deconvolution filters into multiple small filters. Compared to prior acceleration schemes, the implemented acceleration scheme achieves 2.41x - 4.34x performance speedup and reduces the energy consumption by 27.7% - 54.5% on a set of realistic benchmarks. In addition, we also applied the deconvolution computing approach to the off-the-shelf commodity deep learning processors. The performance of deconvolution also exhibits significant performance speedup over prior deconvolution implementations.