Wanzeng Kong

h-index38

8papers

216citations

Novelty50%

AI Score40

Ranked #75,663 of 194,257 authors (top 39%)#16,887 in LG (top 42%)

8 Papers

4.5AIDec 20, 2022

InterMulti:Multi-view Multimodal Interactions with Text-dominated Hierarchical High-order Fusion for Emotion Analysis

Feng Qiu, Wanzeng Kong, Yu Ding

Humans are sophisticated at reading interlocutors' emotions from multimodal signals, such as speech contents, voice tones and facial expressions. However, machines might struggle to understand various emotions due to the difficulty of effectively decoding emotions from the complex interactions between multimodal signals. In this paper, we propose a multimodal emotion analysis framework, InterMulti, to capture complex multimodal interactions from different views and identify emotions from multimodal signals. Our proposed framework decomposes signals of different modalities into three kinds of multimodal interaction representations, including a modality-full interaction representation, a modality-shared interaction representation, and three modality-specific interaction representations. Additionally, to balance the contribution of different modalities and learn a more informative latent interaction representation, we developed a novel Text-dominated Hierarchical High-order Fusion(THHF) module. THHF module reasonably integrates the above three kinds of representations into a comprehensive multimodal interaction representation. Extensive experimental results on widely used datasets, (i.e.) MOSEI, MOSI and IEMOCAP, demonstrate that our method outperforms the state-of-the-art.

1.8LGDec 16, 2022

EffMulti: Efficiently Modeling Complex Multimodal Interactions for Emotion Analysis

Feng Qiu, Chengyang Xie, Yu Ding et al.

Humans are skilled in reading the interlocutor's emotion from multimodal signals, including spoken words, simultaneous speech, and facial expressions. It is still a challenge to effectively decode emotions from the complex interactions of multimodal signals. In this paper, we design three kinds of multimodal latent representations to refine the emotion analysis process and capture complex multimodal interactions from different views, including a intact three-modal integrating representation, a modality-shared representation, and three modality-individual representations. Then, a modality-semantic hierarchical fusion is proposed to reasonably incorporate these representations into a comprehensive interaction representation. The experimental results demonstrate that our EffMulti outperforms the state-of-the-art methods. The compelling performance benefits from its well-designed framework with ease of implementation, lower computing complexity, and less trainable parameters.

6.4LGAug 12, 2024Code

Pattern-Matching Dynamic Memory Network for Dual-Mode Traffic Prediction

Wenchao Weng, Mei Wu, Hanyu Jiang et al.

In recent years, deep learning has increasingly gained attention in the field of traffic prediction. Existing traffic prediction models often rely on GCNs or attention mechanisms with O(N^2) complexity to dynamically extract traffic node features, which lack efficiency and are not lightweight. Additionally, these models typically only utilize historical data for prediction, without considering the impact of the target information on the prediction. To address these issues, we propose a Pattern-Matching Dynamic Memory Network (PM-DMNet). PM-DMNet employs a novel dynamic memory network to capture traffic pattern features with only O(N) complexity, significantly reducing computational overhead while achieving excellent performance. The PM-DMNet also introduces two prediction methods: Recursive Multi-step Prediction (RMP) and Parallel Multi-step Prediction (PMP), which leverage the time features of the prediction targets to assist in the forecasting process. Furthermore, a transfer attention mechanism is integrated into PMP, transforming historical data features to better align with the predicted target states, thereby capturing trend changes more accurately and reducing errors. Extensive experiments demonstrate the superiority of the proposed model over existing benchmarks. The source codes are available at: https://github.com/wengwenchao123/PM-DMNet.

24.2CLMar 15, 2024Code

Enhanced Coherence-Aware Network with Hierarchical Disentanglement for Aspect-Category Sentiment Analysis

Jin Cui, Fumiyo Fukumoto, Xinfeng Wang et al.

Aspect-category-based sentiment analysis (ACSA), which aims to identify aspect categories and predict their sentiments has been intensively studied due to its wide range of NLP applications. Most approaches mainly utilize intrasentential features. However, a review often includes multiple different aspect categories, and some of them do not explicitly appear in the review. Even in a sentence, there is more than one aspect category with its sentiments, and they are entangled intra-sentence, which makes the model fail to discriminately preserve all sentiment characteristics. In this paper, we propose an enhanced coherence-aware network with hierarchical disentanglement (ECAN) for ACSA tasks. Specifically, we explore coherence modeling to capture the contexts across the whole review and to help the implicit aspect and sentiment identification. To address the issue of multiple aspect categories and sentiment entanglement, we propose a hierarchical disentanglement module to extract distinct categories and sentiment features. Extensive experimental and visualization results show that our ECAN effectively decouples multiple categories and sentiments entangled in the coherence representations and achieves state-of-the-art (SOTA) performance. Our codes and data are available online: \url{https://github.com/cuijin-23/ECAN}.

5.0CVDec 15, 2023Code

AEGIS-Net: Attention-guided Multi-Level Feature Aggregation for Indoor Place Recognition

Yuhang Ming, Jian Ma, Xingrui Yang et al.

We present AEGIS-Net, a novel indoor place recognition model that takes in RGB point clouds and generates global place descriptors by aggregating lower-level color, geometry features and higher-level implicit semantic features. However, rather than simple feature concatenation, self-attention modules are employed to select the most important local features that best describe an indoor place. Our AEGIS-Net is made of a semantic encoder, a semantic decoder and an attention-guided feature embedding. The model is trained in a 2-stage process with the first stage focusing on an auxiliary semantic segmentation task and the second one on the place recognition task. We evaluate our AEGIS-Net on the ScanNetPR dataset and compare its performance with a pre-deep-learning feature-based method and five state-of-the-art deep-learning-based methods. Our AEGIS-Net achieves exceptional performance and outperforms all six methods.

3.6CVJun 26, 2025

3D Scene-Camera Representation with Joint Camera Photometric Optimization

Weichen Dai, Kangcheng Ma, Jiaxin Wang et al.

Representing scenes from multi-view images is a crucial task in computer vision with extensive applications. However, inherent photometric distortions in the camera imaging can significantly degrade image quality. Without accounting for these distortions, the 3D scene representation may inadvertently incorporate erroneous information unrelated to the scene, diminishing the quality of the representation. In this paper, we propose a novel 3D scene-camera representation with joint camera photometric optimization. By introducing internal and external photometric model, we propose a full photometric model and corresponding camera representation. Based on simultaneously optimizing the parameters of the camera representation, the proposed method effectively separates scene-unrelated information from the 3D scene representation. Additionally, during the optimization of the photometric parameters, we introduce a depth regularization to prevent the 3D scene representation from fitting scene-unrelated information. By incorporating the camera model as part of the mapping process, the proposed method constructs a complete map that includes both the scene radiance field and the camera photometric model. Experimental results demonstrate that the proposed method can achieve high-quality 3D scene representations, even under conditions of imaging degradation, such as vignetting and dirt.

2.3LGNov 3, 2020

BCGGAN: Ballistocardiogram artifact removal in simultaneous EEG-fMRI using generative adversarial network

Guang Lin, Jianhai Zhang, Yuxi Liu et al.

Due to its advantages of high temporal and spatial resolution, the technology of simultaneous electroencephalogram-functional magnetic resonance imaging (EEG-fMRI) acquisition and analysis has attracted much attention, and has been widely used in various research fields of brain science. However, during the fMRI of the brain, ballistocardiogram (BCG) artifacts can seriously contaminate the EEG. As an unpaired problem, BCG artifact removal now remains a considerable challenge. Aiming to provide a solution, this paper proposed a novel modular generative adversarial network (GAN) and corresponding training strategy to improve the network performance by optimizing the parameters of each module. In this manner, we hope to improve the local representation ability of the network model, thereby improving its overall performance and obtaining a reliable generator for BCG artifact removal. Moreover, the proposed method does not rely on additional reference signal or complex hardware equipment. Experimental results show that, compared with multiple methods, the technique presented in this paper can remove the BCG artifact more effectively while retaining essential EEG information.

11.7SPJul 3, 2020

Transfer Learning for Motor Imagery Based Brain-Computer Interfaces: A Complete Pipeline

Dongrui Wu, Xue Jiang, Ruimin Peng et al.

Transfer learning (TL) has been widely used in motor imagery (MI) based brain-computer interfaces (BCIs) to reduce the calibration effort for a new subject, and demonstrated promising performance. While a closed-loop MI-based BCI system, after electroencephalogram (EEG) signal acquisition and temporal filtering, includes spatial filtering, feature engineering, and classification blocks before sending out the control signal to an external device, previous approaches only considered TL in one or two such components. This paper proposes that TL could be considered in all three components (spatial filtering, feature engineering, and classification) of MI-based BCIs. Furthermore, it is also very important to specifically add a data alignment component before spatial filtering to make the data from different subjects more consistent, and hence to facilitate subsequential TL. Offline calibration experiments on two MI datasets verified our proposal. Especially, integrating data alignment and sophisticated TL approaches can significantly improve the classification performance, and hence greatly reduces the calibration effort.