Sheng Xu

h-index26

3papers

179citations

Novelty52%

AI Score32

Ranked #124,855 of 194,257 authors (top 64%)#41,418 in CV (top 70%)

3 Papers

30.1CVOct 13, 2022Code

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

Yanjing Li, Sheng Xu, Baochang Zhang et al.

The large pre-trained vision transformers (ViTs) have demonstrated remarkable performance on various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices. Among the powerful compression approaches, quantization extremely reduces the computation and memory consumption by low-bit parameters and bit-wise operations. However, low-bit ViTs remain largely unexplored and usually suffer from a significant performance drop compared with the real-valued counterparts. In this work, through extensive empirical analysis, we first identify the bottleneck for severe performance drop comes from the information distortion of the low-bit quantized self-attention map. We then develop an information rectification module (IRM) and a distribution guided distillation (DGD) scheme for fully quantized vision transformers (Q-ViT) to effectively eliminate such distortion, leading to a fully quantized ViTs. We evaluate our methods on popular DeiT and Swin backbones. Extensive experimental results show that our method achieves a much better performance than the prior arts. For example, our Q-ViT can theoretically accelerates the ViT-S by 6.14x and achieves about 80.9% Top-1 accuracy, even surpassing the full-precision counterpart by 1.0% on ImageNet dataset. Our codes and models are attached on https://github.com/YanjingLi0202/Q-ViT

2.6LGDec 2, 2024

Adaptive Traffic Element-Based Streetlight Control Using Neighbor Discovery Algorithm Based on IoT Events

Yupeng Tan, Sheng Xu, Chengyue Su

Intelligent streetlight systems divide the streetlight network into multiple sectors, activating only the streetlights in the corresponding sectors when traffic elements pass by, rather than all streetlights, effectively reducing energy waste. This strategy requires streetlights to understand their neighbor relationships to illuminate only the streetlights in their respective sectors. However, manually configuring the neighbor relationships for a large number of streetlights in complex large-scale road streetlight networks is cumbersome and prone to errors. Due to the crisscrossing nature of roads, it is also difficult to determine the neighbor relationships using GPS or communication positioning. In response to these issues, this article proposes a systematic approach to model the streetlight network as a social network and construct a neighbor relationship probabilistic graph using IoT event records of streetlights detecting traffic elements. Based on this, a multi-objective genetic algorithm based probabilistic graph clustering method is designed to discover the neighbor relationships of streetlights. Considering the characteristic that pedestrians and vehicles usually move at a constant speed on a section of a road, speed consistency is introduced as an optimization objective, which, together with traditional similarity measures, forms a multi-objective function, enhancing the accuracy of neighbor relationship discovery. Extensive experiments on simulation datasets were conducted, comparing the proposed algorithm with other probabilistic graph clustering algorithms. The results demonstrate that the proposed algorithm can more accurately identify the neighbor relationships of streetlights compared to other algorithms, effectively achieving adaptive streetlight control for traffic elements.

7.3CVJan 20, 2022

TerViT: An Efficient Ternary Vision Transformer

Sheng Xu, Yanjing Li, Teli Ma et al.

Vision transformers (ViTs) have demonstrated great potential in various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices. In this paper, we introduce a ternary vision transformer (TerViT) to ternarize the weights in ViTs, which are challenged by the large loss surface gap between real-valued and ternary parameters. To address the issue, we introduce a progressive training scheme by first training 8-bit transformers and then TerViT, and achieve a better optimization than conventional methods. Furthermore, we introduce channel-wise ternarization, by partitioning each matrix to different channels, each of which is with an unique distribution and ternarization interval. We apply our methods to popular DeiT and Swin backbones, and extensive results show that we can achieve competitive performance. For example, TerViT can quantize Swin-S to 13.1MB model size while achieving above 79% Top-1 accuracy on ImageNet dataset.