Yanan Luo

h-index6

3papers

175citations

Novelty50%

AI Score38

Ranked #87,130 of 194,257 authors (top 45%)#29,335 in CV (top 50%)

3 Papers

7.6CVDec 24, 2024Code

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models

Jinhui Yi, Syed Talal Wasim, Yanan Luo et al.

We present an efficient encoder-free approach for video-language understanding that achieves competitive performance while significantly reducing computational overhead. Current video-language models typically rely on heavyweight image encoders (300M-1.1B parameters) or video encoders (1B-1.4B parameters), creating a substantial computational burden when processing multi-frame videos. Our method introduces a novel Spatio-Temporal Alignment Block (STAB) that directly processes video inputs without requiring pre-trained encoders while using only 45M parameters for visual processing - at least a 6.5$\times$ reduction compared to traditional approaches. The STAB architecture combines Local Spatio-Temporal Encoding for fine-grained feature extraction, efficient spatial downsampling through learned attention and separate mechanisms for modeling frame-level and video-level relationships. Our model achieves comparable or superior performance to encoder-based approaches for open-ended video question answering on standard benchmarks. The fine-grained video question-answering evaluation demonstrates our model's effectiveness, outperforming the encoder-based approaches Video-ChatGPT and Video-LLaVA in key aspects like correctness and temporal understanding. Extensive ablation studies validate our architectural choices and demonstrate the effectiveness of our spatio-temporal modeling approach while achieving 3-4$\times$ faster processing speeds than previous methods. Code is available at https://jh-yi.github.io/Video-Panda.

0.9CVMar 1, 2018

A Class-Incremental Learning Method Based on One Class Support Vector Machine

Chengfei Yao, Jie Zou, Yanan Luo et al.

A method based on one class support vector machine (OCSVM) is proposed for class incremental learning. Several OCSVM models divide the input space into several parts. Then, the 1VS1 classifiers are constructed for the confuse part by using the support vectors. During the class incremental learning process, the OCSVM of the new class is trained at first. Then the support vectors of the old classes and the support vectors of the new class are reused to train 1VS1 classifiers for the confuse part. In order to bring more information to the certain support vectors, the support vectors are at the boundary of the distribution of samples as much as possible when the OCSVM is built. Compared with the traditional methods, the proposed method retains the original model and thus reduces memory consumption and training time cost. Various experiments on different datasets also verify the efficiency of the proposed method.

9.9CVFeb 28, 2018

HSI-CNN: A Novel Convolution Neural Network for Hyperspectral Image

Yanan Luo, Jie Zou, Chengfei Yao et al.

With the development of deep learning, the performance of hyperspectral image (HSI) classification has been greatly improved in recent years. The shortage of training samples has become a bottleneck for further improvement of performance. In this paper, we propose a novel convolutional neural network framework for the characteristics of hyperspectral image data, called HSI-CNN. Firstly, the spectral-spatial feature is extracted from a target pixel and its neighbors. Then, a number of one-dimensional feature maps, obtained by convolution operation on spectral-spatial features, are stacked into a two-dimensional matrix. Finally, the two-dimensional matrix considered as an image is fed into standard CNN. This is why we call it HSI-CNN. In addition, we also implements two depth network classification models, called HSI-CNN+XGBoost and HSI-CapsNet, in order to compare the performance of our framework. Experiments show that the performance of hyperspectral image classification is improved efficiently with HSI-CNN framework. We evaluate the model's performance using four popular HSI datasets, which are the Kennedy Space Center (KSC), Indian Pines (IP), Pavia University scene (PU) and Salinas scene (SA). As far as we concerned, HSI-CNN has got the state-of-art accuracy among all methods we have known on these datasets of 99.28%, 99.09%, 99.42%, 98.95% separately.