Hui Li

h-index8

6papers

199citations

Novelty56%

AI Score36

Ranked #96,861 of 194,257 authors (top 50%)#32,556 in CV (top 55%)

6 Papers

23.8CVApr 25, 2023Code

Class Attention Transfer Based Knowledge Distillation

Ziyao Guo, Haonan Yan, Hui Li et al.

Previous knowledge distillation methods have shown their impressive performance on model compression tasks, however, it is hard to explain how the knowledge they transferred helps to improve the performance of the student network. In this work, we focus on proposing a knowledge distillation method that has both high interpretability and competitive performance. We first revisit the structure of mainstream CNN models and reveal that possessing the capacity of identifying class discriminative regions of input is critical for CNN to perform classification. Furthermore, we demonstrate that this capacity can be obtained and enhanced by transferring class activation maps. Based on our findings, we propose class attention transfer based knowledge distillation (CAT-KD). Different from previous KD methods, we explore and present several properties of the knowledge transferred by our method, which not only improve the interpretability of CAT-KD but also contribute to a better understanding of CNN. While having high interpretability, CAT-KD achieves state-of-the-art performance on multiple benchmarks. Code is available at: https://github.com/GzyAftermath/CAT-KD.

4.8IVMar 5, 2022Code

FCNet: A Convolutional Neural Network for Arbitrary-Length Exposure Estimation

Jin Liang, Yuchen Yang, Anran Zhang et al.

The photographs captured by digital cameras usually suffer from over or under exposure problems. For image exposure enhancement, the tasks of Single-Exposure Correction (SEC) and Multi-Exposure Fusion (MEF) are widely studied in the image processing community. However, current SEC or MEF methods are developed under different motivations and thus ignore the internal correlation between SEC and MEF, making it difficult to process arbitrary-length sequences with improper exposures. Besides, the MEF methods usually fail at estimating the exposure of a sequence containing only under-exposed or over-exposed images. To alleviate these problems, in this paper, we develop a novel Fusion-Correction Network (FCNet) to tackle an arbitrary-length (including one) image sequence with improper exposures. This is achieved by fusing and correcting an image sequence by Laplacian Pyramid (LP) image decomposition. In each LP level, the low-frequency base component of the input image sequence is fed into a Fusion block and a Correction block sequentially for consecutive exposure estimation, implemented by alternative exposure fusion and correction. The exposure-corrected image in current LP level is upsampled and fused with the high-frequency detail components of the input image sequence in the next LP level, to output the base component for the Fusion and Correction blocks in next LP level. Experiments on the benchmark dataset demonstrate that our FCNet is effective on arbitrary-length exposure estimation, including both SEC and MEF. The code is publicly released at https://github.com/NKUJinLiang/FCNet.

2.8CVJan 24, 2023Code

Image Super-Resolution using Efficient Striped Window Transformer

Jinpeng Shi, Hui Li, Tianle Liu et al.

Transformers have achieved remarkable results in single-image super-resolution (SR). However, the challenge of balancing model performance and complexity has hindered their application in lightweight SR (LSR). To tackle this challenge, we propose an efficient striped window transformer (ESWT). We revisit the normalization layer in the transformer and design a concise and efficient transformer structure to build the ESWT. Furthermore, we introduce a striped window mechanism to model long-term dependencies more efficiently. To fully exploit the potential of the ESWT, we propose a novel flexible window training strategy that can improve the performance of the ESWT without additional cost. Extensive experiments show that ESWT outperforms state-of-the-art LSR transformers, and achieves a better trade-off between model performance and complexity. The ESWT requires fewer parameters, incurs faster inference, smaller FLOPs, and less memory consumption, making it a promising solution for LSR.

2.6CVDec 18, 2022

Automated Optical Inspection of FAST's Reflector Surface using Drones and Computer Vision

Jianan Li, Shenwang Jiang, Liqiang Song et al.

The Five-hundred-meter Aperture Spherical radio Telescope (FAST) is the world's largest single-dish radio telescope. Its large reflecting surface achieves unprecedented sensitivity but is prone to damage, such as dents and holes, caused by naturally-occurring falling objects. Hence, the timely and accurate detection of surface defects is crucial for FAST's stable operation. Conventional manual inspection involves human inspectors climbing up and examining the large surface visually, a time-consuming and potentially unreliable process. To accelerate the inspection process and increase its accuracy, this work makes the first step towards automating the inspection of FAST by integrating deep-learning techniques with drone technology. First, a drone flies over the surface along a predetermined route. Since surface defects significantly vary in scale and show high inter-class similarity, directly applying existing deep detectors to detect defects on the drone imagery is highly prone to missing and misidentifying defects. As a remedy, we introduce cross-fusion, a dedicated plug-in operation for deep detectors that enables the adaptive fusion of multi-level features in a point-wise selective fashion, depending on local defect patterns. Consequently, strong semantics and fine-grained details are dynamically fused at different positions to support the accurate detection of defects of various scales and types. Our AI-powered drone-based automated inspection is time-efficient, reliable, and has good accessibility, which guarantees the long-term and stable operation of FAST.

5.0CVJun 12, 2023

Scale-Rotation-Equivariant Lie Group Convolution Neural Networks (Lie Group-CNNs)

Wei-Dong Qiao, Yang Xu, Hui Li

The weight-sharing mechanism of convolutional kernels ensures translation-equivariance of convolution neural networks (CNNs). Recently, rotation-equivariance has been investigated. However, research on scale-equivariance or simultaneous scale-rotation-equivariance is insufficient. This study proposes a Lie group-CNN, which can keep scale-rotation-equivariance for image classification tasks. The Lie group-CNN includes a lifting module, a series of group convolution modules, a global pooling layer, and a classification layer. The lifting module transfers the input image from Euclidean space to Lie group space, and the group convolution is parameterized through a fully connected network using Lie-algebra of Lie-group elements as inputs to achieve scale-rotation-equivariance. The Lie group SIM(2) is utilized to establish the Lie group-CNN with scale-rotation-equivariance. Scale-rotation-equivariance of Lie group-CNN is verified and achieves the best recognition accuracy on the blood cell dataset (97.50%) and the HAM10000 dataset (77.90%) superior to Lie algebra convolution network, dilation convolution, spatial transformer network, and scale-equivariant steerable network. In addition, the generalization ability of the Lie group-CNN on SIM(2) on rotation-equivariance is verified on rotated-MNIST and rotated-CIFAR10, and the robustness of the network is verified on SO(2) and SE(2). Therefore, the Lie group-CNN can successfully extract geometric features and performs equivariant recognition on images with rotation and scale transformations.

21.5CVJun 3, 2024Code

DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors

Tianyu Huang, Haoze Zhang, Yihan Zeng et al.

Dynamic 3D interaction has been attracting a lot of attention recently. However, creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, which requires manually assigning precise physical properties to the object or the simulated results would become unnatural. Another solution is to learn the deformation of 3D objects with the distillation of video generative models, which, however, tends to produce 3D videos with small and discontinuous motions due to the inappropriate extraction and application of physics priors. In this work, to combine the strengths and complementing shortcomings of the above two solutions, we propose to learn the physical properties of a material field with video diffusion priors, and then utilize a physics-based Material-Point-Method (MPM) simulator to generate 4D content with realistic motions. In particular, we propose motion distillation sampling to emphasize video motion information during distillation. In addition, to facilitate the optimization, we further propose a KAN-based material field with frame boosting. Experimental results demonstrate that our method enjoys more realistic motions than state-of-the-arts do.