Yawen Chen

h-index29

5papers

37citations

Novelty52%

AI Score24

Ranked #170,089 of 194,257 authors (top 88%)#814 in DC (top 84%)

5 Papers

2.3DCJul 22, 2022

Efficient All-reduce for Distributed DNN Training in Optical Interconnect System

Fei Dai, Yawen Chen, Zhiyi Huang et al.

Communication efficiency plays an important role in accelerating the distributed training of Deep Neural Networks (DNN). All-reduce is the crucial communication primitive to reduce model parameters in distributed DNN training. Most existing all-reduce algorithms are designed for traditional electrical interconnect systems, which cannot meet the communication requirements for distributed training of large DNNs due to the low data bandwidth of the electrical interconnect systems. One of the promising alternatives for electrical interconnect is optical interconnect, which can provide high bandwidth, low transmission delay, and low power cost. We propose an efficient scheme called WRHT (Wavelength Reused Hierarchical Tree) for implementing all-reduce operation in optical interconnect systems. WRHT can take advantage of WDM (Wavelength Division Multiplexing) to reduce the communication time of distributed data-parallel DNN training. We further derive the required number of wavelengths, the minimum number of communication steps, and the communication time for the all-reduce operation on optical interconnect. The constraint of insertion loss is also considered in our analysis. Simulation results show that the communication time of all-reduce by WRHT is reduced by 80.81%, 64.36%, and 82.12%, respectively, compared with three traditional all-reduce algorithms according to our simulation results of an optical interconnect system. Our results also show that WRHT can reduce the communication time of all-reduce operation by 92.42% and 91.31% compared to two existing all-reduce algorithms running in the electrical interconnect system.

1.2DCSep 30, 2021

Accelerating Fully Connected Neural Network on Optical Network-on-Chip (ONoC)

Fei Dai, Yawen Chen, Haibo Zhang et al.

Fully Connected Neural Network (FCNN) is a class of Artificial Neural Networks widely used in computer science and engineering, whereas the training process can take a long time with large datasets in existing many-core systems. Optical Network-on-Chip (ONoC), an emerging chip-scale optical interconnection technology, has great potential to accelerate the training of FCNN with low transmission delay, low power consumption, and high throughput. However, existing methods based on Electrical Network-on-Chip (ENoC) cannot fit in ONoC because of the unique properties of ONoC. In this paper, we propose a fine-grained parallel computing model for accelerating FCNN training on ONoC and derive the optimal number of cores for each execution stage with the objective of minimizing the total amount of time to complete one epoch of FCNN training. To allocate the optimal number of cores for each execution stage, we present three mapping strategies and compare their advantages and disadvantages in terms of hotspot level, memory requirement, and state transitions. Simulation results show that the average prediction error for the optimal number of cores in NN benchmarks is within 2.3%. We further carry out extensive simulations which demonstrate that FCNN training time can be reduced by 22.28% and 4.91% on average using our proposed scheme, compared with traditional parallel computing methods that either allocate a fixed number of cores or allocate as many cores as possible, respectively. Compared with ENoC, simulation results show that under batch sizes of 64 and 128, on average ONoC can achieve 21.02% and 12.95% on reducing training time with 47.85% and 39.27% on saving energy, respectively.

2.2RONov 9, 2020

Upper Extremity Load Reduction for Lower LimbExoskeleton Trajectory Generation Using AnkleTorque Minimization

Yik Ben Wong, Yawen Chen, Kam Fai Elvis Tsang et al.

Recently, the lower limb exoskeletons which providemobility for paraplegic patients to support their daily life havedrawn much attention. However, the pilots are required to applyexcessive force through a pair of crutches to maintain balanceduring walking. This paper proposes a novel gait trajectorygeneration algorithm for exoskeleton locomotion on flat groundand stair which aims to minimize the force applied by the pilotwithout increasing the degree of freedom (DoF) of the system.First, the system is modelled as a five-link mechanism dynam-ically for torque computing. Then, an optimization approachis used to generate the trajectory minimizing the ankle torquewhich is correlated to the supporting force. Finally, experimentis conducted to compare the different gait generation algorithmsthrough measurement of ground reaction force (GRF) appliedon the crutches

2.2ROAug 9, 2020

Variable Stiffness Control with Strict Frequency Domain Constraints for Physical Human-Robot Interaction

Wulin Zou, Pu Duan, Yawen Chen et al.

Variable impedance control is advantageous for physical human-robot interaction to improve safety, adaptability and many other aspects. This paper presents a gain-scheduled variable stiffness control approach under strict frequency-domain constraints. Firstly, to reduce conservativeness, we characterize and constrain the impedance rendering, actuator saturation, disturbance/noise rejection and passivity requirements into their specific frequency bands. This relaxation makes sense because of the restricted frequency properties of the interactive robots. Secondly, a gain-scheduled method is taken to regulate the controller gains with respect to the desired stiffness. Thirdly, the scheduling function is parameterized via a nonsmooth optimization method. Finally, the proposed approach is validated by simulations, experiments and comparisons with a gain-fixed passivity-based PID method.

2.7LGNov 23, 2016

Improving Efficiency of SVM k-fold Cross-validation by Alpha Seeding

Zeyi Wen, Bin Li, Rao Kotagiri et al.

The k-fold cross-validation is commonly used to evaluate the effectiveness of SVMs with the selected hyper-parameters. It is known that the SVM k-fold cross-validation is expensive, since it requires training k SVMs. However, little work has explored reusing the h-th SVM for training the (h+1)-th SVM for improving the efficiency of k-fold cross-validation. In this paper, we propose three algorithms that reuse the h-th SVM for improving the efficiency of training the (h+1)-th SVM. Our key idea is to efficiently identify the support vectors and to accurately estimate their associated weights (also called alpha values) of the next SVM by using the previous SVM. Our experimental results show that our algorithms are several times faster than the k-fold cross-validation which does not make use of the previously trained SVM. Moreover, our algorithms produce the same results (hence same accuracy) as the k-fold cross-validation which does not make use of the previously trained SVM.