Zhang Hao

CVJan 23, 2024

Training-Free Action Recognition and Goal Inference with Dynamic Frame Selection

Ee Yeo Keat, Zhang Hao, Alexander Matyasko et al.

We introduce VidTFS, a Training-free, open-vocabulary video goal and action inference framework that combines the frozen vision foundational model (VFM) and large language model (LLM) with a novel dynamic Frame Selection module. Our experiments demonstrate that the proposed frame selection module improves the performance of the framework significantly. We validate the performance of the proposed VidTFS on four widely used video datasets, including CrossTask, COIN, UCF101, and ActivityNet, covering goal inference and action recognition tasks under open-vocabulary settings without requiring any training or fine-tuning. The results show that VidTFS outperforms pretrained and instruction-tuned multimodal language models that directly stack LLM and VFM for downstream video inference tasks. Our VidTFS with its adaptability shows the future potential for generalizing to new training-free video inference tasks.

SPNov 5, 2019

Deep Learning for MIMO Channel Estimation: Interpretation, Performance, and Comparison

Hu Qiang, Gao Feifei, Zhang Hao et al.

Deep learning (DL) has emerged as an effective tool for channel estimation in wireless communication systems, especially under some imperfect environments. However, even with such unprecedented success, DL methods are often regarded as black boxes and are lack of explanations on their internal mechanisms, which severely limits further improvement and extension. In this paper, we present a preliminary theoretical analysis on DL based channel estimation for multiple-antenna systems to understand and interpret its internal mechanism. Deep neural network (DNN) with rectified linear unit (ReLU) activation function is mathematically equivalent to a piecewise linear function. Hence, the corresponding DL estimator can achieve universal approximation to a large family of functions by making efficient use of piecewise linearity. We demonstrate that DL based channel estimation does not restrict to any specific signal model and approaches to the minimum mean-squared error (MMSE) estimation in various scenarios without requiring any prior knowledge of channel statistics. Therefore, DL based channel estimation outperforms or is at least comparable with traditional channel estimation, depending on the types of channels. Simulation results confirm the accuracy of the proposed interpretation and demonstrate the effectiveness of DL based channel estimation under both linear and nonlinear signal models.

Zhang Hao

2 Papers