Fei Hu

h-index44

8papers

242citations

Novelty48%

AI Score38

Ranked #87,856 of 194,257 authors (top 45%)#29,582 in CV (top 50%)

8 Papers

12.2CVJun 9, 2022Code

AGConv: Adaptive Graph Convolution on 3D Point Clouds

Mingqiang Wei, Zeyong Wei, Haoran Zhou et al.

Convolution on 3D point clouds is widely researched yet far from perfect in geometric deep learning. The traditional wisdom of convolution characterises feature correspondences indistinguishably among 3D points, arising an intrinsic limitation of poor distinctive feature learning. In this paper, we propose Adaptive Graph Convolution (AGConv) for wide applications of point cloud analysis. AGConv generates adaptive kernels for points according to their dynamically learned features. Compared with the solution of using fixed/isotropic kernels, AGConv improves the flexibility of point cloud convolutions, effectively and precisely capturing the diverse relations between points from different semantic parts. Unlike the popular attentional weight schemes, AGConv implements the adaptiveness inside the convolution operation instead of simply assigning different weights to the neighboring points. Extensive evaluations clearly show that our method outperforms state-of-the-arts of point cloud classification and segmentation on various benchmark datasets.Meanwhile, AGConv can flexibly serve more point cloud analysis approaches to boost their performance. To validate its flexibility and effectiveness, we explore AGConv-based paradigms of completion, denoising, upsampling, registration and circle extraction, which are comparable or even superior to their competitors. Our code is available at https://github.com/hrzhou2/AdaptConv-master.

16.1CVAug 26, 2023

Vision-Based Human Pose Estimation via Deep Learning: A Survey

Gongjin Lan, Yu Wu, Fei Hu et al.

Human pose estimation (HPE) has attracted a significant amount of attention from the computer vision community in the past decades. Moreover, HPE has been applied to various domains, such as human-computer interaction, sports analysis, and human tracking via images and videos. Recently, deep learning-based approaches have shown state-of-the-art performance in HPE-based applications. Although deep learning-based approaches have achieved remarkable performance in HPE, a comprehensive review of deep learning-based HPE methods remains lacking in the literature. In this article, we provide an up-to-date and in-depth overview of the deep learning approaches in vision-based HPE. We summarize these methods of 2-D and 3-D HPE, and their applications, discuss the challenges and the research trends through bibliometrics, and provide insightful recommendations for future research. This article provides a meaningful overview as introductory material for beginners to deep learning-based HPE, as well as supplementary material for advanced researchers.

2.6CVSep 5, 2022

SPCNet: Stepwise Point Cloud Completion Network

Fei Hu, Honghua Chen, Xuequan Lu et al.

How will you repair a physical object with large missings? You may first recover its global yet coarse shape and stepwise increase its local details. We are motivated to imitate the above physical repair procedure to address the point cloud completion task. We propose a novel stepwise point cloud completion network (SPCNet) for various 3D models with large missings. SPCNet has a hierarchical bottom-to-up network architecture. It fulfills shape completion in an iterative manner, which 1) first infers the global feature of the coarse result; 2) then infers the local feature with the aid of global feature; and 3) finally infers the detailed result with the help of local feature and coarse result. Beyond the wisdom of simulating the physical repair, we newly design a cycle loss %based training strategy to enhance the generalization and robustness of SPCNet. Extensive experiments clearly show the superiority of our SPCNet over the state-of-the-art methods on 3D point clouds with large missings.

1.4CVDec 1, 2022

Concealed Object Detection for Passive Millimeter-Wave Security Imaging Based on Task-Aligned Detection Transformer

Cheng Guo, Fei Hu, Yan Hu

Passive millimeter-wave (PMMW) is a significant potential technique for human security screening. Several popular object detection networks have been used for PMMW images. However, restricted by the low resolution and high noise of PMMW images, PMMW hidden object detection based on deep learning usually suffers from low accuracy and low classification confidence. To tackle the above problems, this paper proposes a Task-Aligned Detection Transformer network, named PMMW-DETR. In the first stage, a Denoising Coarse-to-Fine Transformer (DCFT) backbone is designed to extract long- and short-range features in the different scales. In the second stage, we propose the Query Selection module to introduce learned spatial features into the network as prior knowledge, which enhances the semantic perception capability of the network. In the third stage, aiming to improve the classification performance, we perform a Task-Aligned Dual-Head block to decouple the classification and regression tasks. Based on our self-developed PMMW security screening dataset, experimental results including comparison with State-Of-The-Art (SOTA) methods and ablation study demonstrate that the PMMW-DETR obtains higher accuracy and classification confidence than previous works, and exhibits robustness to the PMMW images of low quality.

1.6LGNov 18, 2021Code

A Novel Optimized Asynchronous Federated Learning Framework

Zhicheng Zhou, Hailong Chen, Kunhua Li et al.

Federated Learning (FL) since proposed has been applied in many fields, such as credit assessment, medical, etc. Because of the difference in the network or computing resource, the clients may not update their gradients at the same time that may take a lot of time to wait or idle. That's why Asynchronous Federated Learning (AFL) method is needed. The main bottleneck in AFL is communication. How to find a balance between the model performance and the communication cost is a challenge in AFL. This paper proposed a novel AFL framework VAFL. And we verified the performance of the algorithm through sufficient experiments. The experiments show that VAFL can reduce the communication times about 51.02\% with 48.23\% average communication compression rate and allow the model to be converged faster. The code is available at \url{https://github.com/RobAI-Lab/VAFL}

10.2CVJul 20, 2025

LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering

Xinxin Dong, Baoyun Peng, Haokai Ma et al.

Video Question Answering (VideoQA) requires identifying sparse critical moments in long videos and reasoning about their causal relationships to answer semantically complex questions. While recent advances in multimodal learning have improved alignment and fusion, current approaches remain limited by two prevalent but fundamentally flawed strategies: (1) task-agnostic sampling indiscriminately processes all frames, overwhelming key events with irrelevant content; and (2) heuristic retrieval captures superficial patterns but misses causal-temporal structures needed for complex reasoning. To address these challenges, we introduce LeAdQA, an innovative approach that bridges these gaps through synergizing causal-aware query refinement with fine-grained visual grounding. Our method first leverages LLMs to reformulate question-option pairs, resolving causal ambiguities and sharpening temporal focus. These refined queries subsequently direct a temporal grounding model to precisely retrieve the most salient segments, complemented by an adaptive fusion mechanism dynamically integrating the evidence to maximize relevance. The integrated visual-textual cues are then processed by an MLLM to generate accurate, contextually-grounded answers. Experiments on NExT-QA, IntentQA, and NExT-GQA demonstrate that our method's precise visual grounding substantially enhances the understanding of video-question relationships, achieving state-of-the-art (SOTA) performance on complex reasoning tasks while maintaining computational efficiency.

5.1MMJun 25, 2020

QoE-Driven UAV-Enabled Pseudo-Analog Wireless Video Broadcast: A Joint Optimization of Power and Trajectory

Xiao-Wei Tang, Xin-Lin Huang, Fei Hu

The explosive demands for high quality mobile video services have caused heavy overload to the existing cellular networks. Although the small cell has been proposed to alleviate such a problem, the network operators may not be interested in deploying numerous base stations (BSs) due to expensive infrastructure construction and maintenance. The unmanned aerial vehicles (UAVs) can provide the low-cost and quick deployment, which can support high-quality line-of-sight communications and have become promising mobile BSs. In this paper, we propose a quality-of-experience (QoE)-driven UAV-enabled pseudo-analog wireless video broadcast scheme, which provides mobile video broadcast services for ground users (GUs). Due to limited energy available in UAV, the aim of the proposed scheme is to maximize the minimum peak signal-to-noise ratio (PSNR) of GUs' video reconstruction quality by jointly optimizing the transmission power allocation strategy and the UAV trajectory. Firstly, the reconstructed video quality at GUs is defined under the constraints of the UAV's total energy and motion mechanism, and the proposed scheme is formulated as a complex non-convex optimization problem. Then, the optimization problem is simplified to obtain a tractable suboptimal solution with the help of the block coordinate descent model and the successive convex approximation model. Finally, the experimental results are presented to show the effectiveness of the proposed scheme. Specifically, the proposed scheme can achieve over 1.6dB PSNR gains in terms of GUs' minimum PSNR, compared with the state-of-the-art schemes, e.g., DVB, SoftCast, and SharpCast.

3.3MMMay 19, 2020

Human-Perception-Oriented Pseudo Analog Video Transmissions with Deep Learning

Xiao-Wei Tang, Xin-Lin Huang, Fei Hu et al.

Recently, pseudo analog transmission has gained increasing attentions due to its ability to alleviate the cliff effect in video multicast scenarios. The existing pseudo analog systems are sorely optimized under the minimum mean squared error criterion without taking the perceptual video quality into consideration. In this paper, we propose a human-perception-based pseudo analog video transmission system named ROIC-Cast, which aims to intelligently enhance the transmission quality of the region-of-interest (ROI) parts. Firstly, the classic deep learning based saliency detection algorithm is adopted to decompose the continuous video sequences into ROI and non-ROI blocks. Secondly, an effective compression method is used to reduce the data amount of side information generated by the ROI extraction module. Then, the power allocation scheme is formulated as a convex problem, and the optimal transmission power for both ROI and non-ROI blocks is derived in a closed form. Finally, the simulations are conducted to validate the proposed system by comparing with a few of existing systems, e.g., KMV-Cast, SoftCast, and DAC-RAN. The proposed ROIC-Cast can achieve over 4.1dB peak signal- to-noise ratio gains of ROI compared with other systems, given the channel signal-to-noise ratio as -5dB, 0dB, 5dB, and 10dB, respectively. This significant performance improvement is due to the automatic ROI extraction, high-efficiency data compression as well as adaptive power allocation.