Feng Ye

h-index8

3papers

11citations

Novelty38%

AI Score23

Ranked #174,604 of 194,257 authors (top 90%)#29,001 in CL (top 94%)

3 Papers

1.1CLOct 26, 2022

Multilevel Transformer For Multimodal Emotion Recognition

Junyi He, Meimei Wu, Meng Li et al.

Multimodal emotion recognition has attracted much attention recently. Fusing multiple modalities effectively with limited labeled data is a challenging task. Considering the success of pre-trained model and fine-grained nature of emotion expression, it is reasonable to take these two aspects into consideration. Unlike previous methods that mainly focus on one aspect, we introduce a novel multi-granularity framework, which combines fine-grained representation with pre-trained utterance-level representation. Inspired by Transformer TTS, we propose a multilevel transformer model to perform fine-grained multimodal emotion recognition. Specifically, we explore different methods to incorporate phoneme-level embedding with word-level embedding. To perform multi-granularity learning, we simply combine multilevel transformer model with Albert. Extensive experimental results show that both our multilevel transformer model and multi-granularity model outperform previous state-of-the-art approaches on IEMOCAP dataset with text transcripts and speech signal.

1.5CVJan 15, 2023

Learning to Compress Unmanned Aerial Vehicle (UAV) Captured Video: Benchmark and Analysis

Chuanmin Jia, Feng Ye, Huifang Sun et al.

During the past decade, the Unmanned-Aerial-Vehicles (UAVs) have attracted increasing attention due to their flexible, extensive, and dynamic space-sensing capabilities. The volume of video captured by UAVs is exponentially growing along with the increased bitrate generated by the advancement of the sensors mounted on UAVs, bringing new challenges for on-device UAV storage and air-ground data transmission. Most existing video compression schemes were designed for natural scenes without consideration of specific texture and view characteristics of UAV videos. In this work, we first contribute a detailed analysis of the current state of the field of UAV video coding. Then we propose to establish a novel task for learned UAV video coding and construct a comprehensive and systematic benchmark for such a task, present a thorough review of high quality UAV video datasets and benchmarks, and contribute extensive rate-distortion efficiency comparison of learned and conventional codecs after. Finally, we discuss the challenges of encoding UAV videos. It is expected that the benchmark will accelerate the research and development in video coding on drone platforms.

1.6LGAug 19, 2021Code

Fast Newton method solving KLR based on Multilevel Circulant Matrix with log-linear complexity

Junna Zhang, Shuisheng Zhou, Cui Fu et al.

Kernel logistic regression (KLR) is a conventional nonlinear classifier in machine learning. With the explosive growth of data size, the storage and computation of large dense kernel matrices is a major challenge in scaling KLR. Even the nyström approximation is applied to solve KLR, it also faces the time complexity of $O(nc^2)$ and the space complexity of $O(nc)$, where $n$ is the number of training instances and $c$ is the sampling size. In this paper, we propose a fast Newton method efficiently solving large-scale KLR problems by exploiting the storage and computing advantages of multilevel circulant matrix (MCM). Specifically, by approximating the kernel matrix with an MCM, the storage space is reduced to $O(n)$, and further approximating the coefficient matrix of the Newton equation as MCM, the computational complexity of Newton iteration is reduced to $O(n \log n)$. The proposed method can run in log-linear time complexity per iteration, because the multiplication of MCM (or its inverse) and vector can be implemented the multidimensional fast Fourier transform (mFFT). Experimental results on some large-scale binary-classification and multi-classification problems show that the proposed method enables KLR to scale to large scale problems with less memory consumption and less training time without sacrificing test accuracy.