Zihe Liu

LG
h-index70
12papers
124citations
Novelty47%
AI Score54

12 Papers

SPJul 12, 2022
Self-supervised Group Meiosis Contrastive Learning for EEG-Based Emotion Recognition

Haoning Kan, Jiale Yu, Jiajin Huang et al.

The progress of EEG-based emotion recognition has received widespread attention from the fields of human-machine interactions and cognitive science in recent years. However, how to recognize emotions with limited labels has become a new research and application bottleneck. To address the issue, this paper proposes a Self-supervised Group Meiosis Contrastive learning framework (SGMC) based on the stimuli consistent EEG signals in human being. In the SGMC, a novel genetics-inspired data augmentation method, named Meiosis, is developed. It takes advantage of the alignment of stimuli among the EEG samples in a group for generating augmented groups by pairing, cross exchanging, and separating. And the model adopts a group projector to extract group-level feature representations from group EEG samples triggered by the same emotion video stimuli. Then contrastive learning is employed to maximize the similarity of group-level representations of augmented groups with the same stimuli. The SGMC achieves the state-of-the-art emotion recognition results on the publicly available DEAP dataset with an accuracy of 94.72% and 95.68% in valence and arousal dimensions, and also reaches competitive performance on the public SEED dataset with an accuracy of 94.04%. It is worthy of noting that the SGMC shows significant performance even when using limited labels. Moreover, the results of feature visualization suggest that the model might have learned video-level emotion-related feature representations to improve emotion recognition. And the effects of group size are further evaluated in the hyper parametric analysis. Finally, a control experiment and ablation study are carried out to examine the rationality of architecture. The code is provided publicly online.

SYApr 19Code
CAR-EnKF: A Covariance-Adaptive and Recalibrated Ensemble Kalman Filter Framework

Shida Jiang, Shengyu Tao, Zihe Liu et al.

The ensemble Kalman filter (EnKF) is widely used for nonlinear and high-dimensional state estimation because it replaces complex covariance propagation with simple ensemble statistics. However, conventional EnKF implementations can become overconfident in the presence of measurement nonlinearity. The commonly used covariance inflation technique only partially alleviates this issue. This paper proposes a covariance-adaptive and recalibrated ensemble Kalman filter (CAR-EnKF) framework for nonlinear state estimation. The framework introduces two improvements that are only active for nonlinear measurements and reduce to the conventional EnKF framework without covariance inflation in the linear case: (i) a recalibration mechanism that reassesses the effect of the chosen Kalman gain after updating the ensemble mean, and (ii) a positive semidefinite covariance compensation term that accounts for measurement nonlinearity. An adaptive update law based on the normalized innovation squared further tunes the compensation magnitude online. The framework is algorithmically general and is specialized here to the stochastic EnKF and the ensemble transform Kalman filter (ETKF). Experiments on feature-based SLAM and the Lorenz--96 system show that CAR-EnKF consistently reduces RMSE relative to conventional EnKF baselines, with especially large improvements at low measurement-noise levels. The related codes are available at \href{https://github.com/Shida-Jiang/CAR-EnKF-A-Covariance-Adaptive-and-Recalibrated-Ensemble-Kalman-Filter-Framework}

CLSep 19, 2024
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation

Chen Liang, Zhifan Feng, Zihe Liu et al.

Chain-of-thought prompting significantly boosts the reasoning ability of large language models but still faces three issues: hallucination problem, restricted interpretability, and uncontrollable generation. To address these challenges, we present AgentCOT, a llm-based autonomous agent framework, which can solve complex problems in an agent-style manner by multiple round LLM generation. At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence. In addition, we integrate the step's index into the reasoning process to form a graph structure for complex inference logic. We introduce two new strategies to enhance the performance of AgentCOT.We conduct extensive experiments to verify the effectiveness of our method on six common benchmarks. Results exhibit that our method brings in substantial improvements over current competitive approaches.

LGAug 11, 2025Code
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Zihe Liu, Jiashun Liu, Yancheng He et al.

Reinforcement learning for LLM reasoning has rapidly emerged as a prominent research area, marked by a significant surge in related studies on both algorithmic innovations and practical applications. Despite this progress, several critical challenges remain, including the absence of standardized guidelines for employing RL techniques and a fragmented understanding of their underlying mechanisms. Additionally, inconsistent experimental settings, variations in training data, and differences in model initialization have led to conflicting conclusions, obscuring the key characteristics of these techniques and creating confusion among practitioners when selecting appropriate techniques. This paper systematically reviews widely adopted RL techniques through rigorous reproductions and isolated evaluations within a unified open-source framework. We analyze the internal mechanisms, applicable scenarios, and core principles of each technique through fine-grained experiments, including datasets of varying difficulty, model sizes, and architectures. Based on these insights, we present clear guidelines for selecting RL techniques tailored to specific setups, and provide a reliable roadmap for practitioners navigating the RL for the LLM domain. Finally, we reveal that a minimalist combination of two techniques can unlock the learning capability of critic-free policies using vanilla PPO loss. The results demonstrate that our simple combination consistently improves performance, surpassing strategies like GRPO and DAPO.

LGSep 7, 2024
NGD converges to less degenerate solutions than SGD

Moosa Saghir, N. R. Raghavendra, Zihe Liu et al.

The number of free parameters, or dimension, of a model is a straightforward way to measure its complexity: a model with more parameters can encode more information. However, this is not an accurate measure of complexity: models capable of memorizing their training data often generalize well despite their high dimension. Effective dimension aims to more directly capture the complexity of a model by counting only the number of parameters required to represent the functionality of the model. Singular learning theory (SLT) proposes the learning coefficient $ λ$ as a more accurate measure of effective dimension. By describing the rate of increase of the volume of the region of parameter space around a local minimum with respect to loss, $ λ$ incorporates information from higher-order terms. We compare $ λ$ of models trained using natural gradient descent (NGD) and stochastic gradient descent (SGD), and find that those trained with NGD consistently have a higher effective dimension for both of our methods: the Hessian trace $ \text{Tr}(\mathbf{H}) $, and the estimate of the local learning coefficient (LLC) $ \hatλ(w^*) $.

LGMay 23, 2024
A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points

Zihe Liu, Jie Lu, Guangquan Zhang et al.

Deep reinforcement learning is used in various domains, but usually under the assumption that the environment has stationary conditions like transitions and state distributions. When this assumption is not met, performance suffers. For this reason, tracking continuous environmental changes and adapting to unpredictable conditions is challenging yet crucial because it ensures that systems remain reliable and flexible in practical scenarios. Our research introduces Behavior-Aware Detection and Adaptation (BADA), an innovative framework that merges environmental change detection with behavior adaptation. The key inspiration behind our method is that policies exhibit different global behaviors in changing environments. Specifically, environmental changes are identified by analyzing variations between behaviors using Wasserstein distances without manually set thresholds. The model adapts to the new environment through behavior regularization based on the extent of changes. The results of a series of experiments demonstrate better performance relative to several current algorithms. This research also indicates significant potential for tackling this long-standing challenge.

CLApr 3
Multi-Aspect Knowledge Distillation for Language Model with Low-rank Factorization

Zihe Liu, Yulong Mao, Jinan Xu et al.

Knowledge distillation is an effective technique for pre-trained language model compression. However, existing methods only focus on the knowledge distribution among layers, which may cause the loss of fine-grained information in the alignment process. To address this issue, we introduce the Multi-aspect Knowledge Distillation (MaKD) method, which mimics the self-attention and feed-forward modules in greater depth to capture rich language knowledge information at different aspects. Experimental results demonstrate that MaKD can achieve competitive performance compared with various strong baselines with the same storage parameter budget. In addition, our method also performs well in distilling auto-regressive architecture models.

LGOct 13, 2025
Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony

Han Lu, Zichen Liu, Shaopan Xiong et al.

Synchronous Reinforcement Learning (RL) post-training has emerged as a crucial step for enhancing Large Language Models (LLMs) with diverse capabilities. However, many systems designed to accelerate RL post-training still suffer from low resource utilization and limited scalability. We present ROLL Flash, a system that extends ROLL with native support for asynchronous RL post-training. ROLL Flash is built upon two core design principles: fine-grained parallelism and rollout-train decoupling. Guided by these principles, ROLL Flash provides flexible programming interfaces that enable a fully asynchronous training architecture and support efficient rollout mechanisms, including queue scheduling and environment-level asynchronous execution. Through comprehensive theoretical analysis and extensive experiments, we demonstrate that ROLL Flash significantly improves resource utilization and scalability over synchronous RL post-training. ROLL Flash achieves up to 2.24x speedup on RLVR tasks and 2.72x on agentic tasks, using the same GPU budget as synchronous baselines. Furthermore, we implement several popular off-policy algorithms and verify that asynchronous training can achieve performance on par with synchronous training.

CVJul 15, 2025
A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition

Xinkui Zhao, Jinsong Shu, Yangyang Wu et al.

Multimodal Emotion Recognition (MER) often encounters incomplete multimodality in practical applications due to sensor failures or privacy protection requirements. While existing methods attempt to address various incomplete multimodal scenarios by balancing the training of each modality combination through additional gradients, these approaches face a critical limitation: training gradients from different modality combinations conflict with each other, ultimately degrading the performance of the final prediction model. In this paper, we propose a unimodal decoupled dynamic low-rank adaptation method based on modality combinations, named MCULoRA, which is a novel framework for the parameter-efficient training of incomplete multimodal learning models. MCULoRA consists of two key modules, modality combination aware low-rank adaptation (MCLA) and dynamic parameter fine-tuning (DPFT). The MCLA module effectively decouples the shared information from the distinct characteristics of individual modality combinations. The DPFT module adjusts the training ratio of modality combinations based on the separability of each modality's representation space, optimizing the learning efficiency across different modality combinations. Our extensive experimental evaluation in multiple benchmark datasets demonstrates that MCULoRA substantially outperforms previous incomplete multimodal learning approaches in downstream task accuracy.

LGNov 1, 2024
Efficient Model Compression for Bayesian Neural Networks

Diptarka Saha, Zihe Liu, Feng Liang

Model Compression has drawn much attention within the deep learning community recently. Compressing a dense neural network offers many advantages including lower computation cost, deployability to devices of limited storage and memories, and resistance to adversarial attacks. This may be achieved via weight pruning or fully discarding certain input features. Here we demonstrate a novel strategy to emulate principles of Bayesian model selection in a deep learning setup. Given a fully connected Bayesian neural network with spike-and-slab priors trained via a variational algorithm, we obtain the posterior inclusion probability for every node that typically gets lost. We employ these probabilities for pruning and feature selection on a host of simulated and real-world benchmark data and find evidence of better generalizability of the pruned model in all our experiments.

CVMay 13, 2021
Relation-aware Hierarchical Attention Framework for Video Question Answering

Fangtao Li, Ting Bai, Chenyu Cao et al.

Video Question Answering (VideoQA) is a challenging video understanding task since it requires a deep understanding of both question and video. Previous studies mainly focus on extracting sophisticated visual and language embeddings, fusing them by delicate hand-crafted networks. However, the relevance of different frames, objects, and modalities to the question are varied along with the time, which is ignored in most of existing methods. Lacking understanding of the the dynamic relationships and interactions among objects brings a great challenge to VideoQA task. To address this problem, we propose a novel Relation-aware Hierarchical Attention (RHA) framework to learn both the static and dynamic relations of the objects in videos. In particular, videos and questions are embedded by pre-trained models firstly to obtain the visual and textual features. Then a graph-based relation encoder is utilized to extract the static relationship between visual objects. To capture the dynamic changes of multimodal objects in different video frames, we consider the temporal, spatial, and semantic relations, and fuse the multimodal features by hierarchical attention mechanism to predict the answer. We conduct extensive experiments on a large scale VideoQA dataset, and the experimental results demonstrate that our RHA outperforms the state-of-the-art methods.

CVOct 19, 2020
Frame Aggregation and Multi-Modal Fusion Framework for Video-Based Person Recognition

Fangtao Li, Wenzhe Wang, Zihe Liu et al.

Video-based person recognition is challenging due to persons being blocked and blurred, and the variation of shooting angle. Previous research always focused on person recognition on still images, ignoring similarity and continuity between video frames. To tackle the challenges above, we propose a novel Frame Aggregation and Multi-Modal Fusion (FAMF) framework for video-based person recognition, which aggregates face features and incorporates them with multi-modal information to identify persons in videos. For frame aggregation, we propose a novel trainable layer based on NetVLAD (named AttentionVLAD), which takes arbitrary number of features as input and computes a fixed-length aggregation feature based on feature quality. We show that introducing an attention mechanism to NetVLAD can effectively decrease the impact of low-quality frames. For the multi-model information of videos, we propose a Multi-Layer Multi-Modal Attention (MLMA) module to learn the correlation of multi-modality by adaptively updating Gram matrix. Experimental results on iQIYI-VID-2019 dataset show that our framework outperforms other state-of-the-art methods.