h-index30
35papers
920citations
Novelty48%
AI Score59

35 Papers

CVJul 15, 2024Code
Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry

Ziheng Chen, Yue Song, Xiao-Jun Wu et al.

Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations. GCP typically performs classification of the covariance matrices by applying matrix function normalization, such as matrix logarithm or power, followed by a Euclidean classifier. However, covariance matrices inherently lie in a Riemannian manifold, known as the Symmetric Positive Definite (SPD) manifold. The current literature does not provide a satisfactory explanation of why Euclidean classifiers can be applied directly to Riemannian features after the normalization of the matrix power. To mitigate this gap, this paper provides a comprehensive and unified understanding of the matrix logarithm and power from a Riemannian geometry perspective. The underlying mechanism of matrix functions in GCP is interpreted from two perspectives: one based on tangent classifiers (Euclidean classifiers on the tangent space) and the other based on Riemannian classifiers. Via theoretical analysis and empirical validation through extensive experiments on fine-grained and large-scale visual classification datasets, we conclude that the working mechanism of the matrix functions should be attributed to the Riemannian classifiers they implicitly respect. The code is available at https://github.com/GitZH-Chen/RiemGCP.git.

DGJul 2, 2024Code
Fast and Stable Riemannian Metrics on SPD Manifolds via Cholesky Product Geometry

Ziheng Chen, Yue Song, Xiao-Jun Wu et al.

Recent advances in Symmetric Positive Definite (SPD) matrix learning show that Riemannian metrics are fundamental to effective SPD neural networks. Motivated by this, we revisit the geometry of the Cholesky factors and uncover a simple product structure that enables convenient metric design. Building on this insight, we propose two fast and stable SPD metrics, Power--Cholesky Metric (PCM) and Bures--Wasserstein--Cholesky Metric (BWCM), derived via Cholesky decomposition. Compared with existing SPD metrics, the proposed metrics provide closed-form operators, computational efficiency, and improved numerical stability. We further apply our metrics to construct Riemannian Multinomial Logistic Regression (MLR) classifiers and residual blocks for SPD neural networks. Experiments on SPD deep learning, numerical stability analyses, and tensor interpolation demonstrate the effectiveness, efficiency, and robustness of our metrics. The code is available at https://github.com/GitZH-Chen/PCM_BWCM.

CLMay 26
Disentangling Language Roles in Multilingual LLM Task Execution

Qishi Zhan, Minxuan Hu, Seoyeon Jang et al.

Multilingual LLMs are increasingly used when instruction, source content, and required response languages do not coincide. Existing benchmarks have expanded multilingual instruction-following evaluation, but they rarely isolate these three roles within a fully crossed design. We introduce MTM-Bench, a controlled benchmark for language-conditioned task execution in which each instance is defined by a triplet \((L_{\text{instr}}, L_{\text{content}}, L_{\text{resp}})\). Across English, Spanish, and Chinese, MTM-Bench enumerates all 27 triplets and contains 2{,}430 instances per model across semantic reversal, final-state extraction, and language purity with update realization. We evaluate 20 frontier and open-weight LLMs using decomposed metrics for semantic correctness, target-language adherence, constraint satisfaction, contamination ratio, and joint success, with scoring validated by a targeted human audit. The fully crossed design reveals that degradation is organized by the role a language occupies in the task structure, not merely by mismatch count. The response-language role is the dominant axis of variation, and a single response-slot mismatch accounts for most degradation. The response-only and full-mismatch comparison suggests that mismatch count is not a monotonic predictor of difficulty, with model-level ordering varying across systems. Task families fail through distinct channels, showing that semantic correctness alone does not capture reliable multilingual task execution.

IRAug 4, 2022
GREASE: Generate Factual and Counterfactual Explanations for GNN-based Recommendations

Ziheng Chen, Fabrizio Silvestri, Jia Wang et al.

Recently, graph neural networks (GNNs) have been widely used to develop successful recommender systems. Although powerful, it is very difficult for a GNN-based recommender system to attach tangible explanations of why a specific item ends up in the list of suggestions for a given user. Indeed, explaining GNN-based recommendations is unique, and existing GNN explanation methods are inappropriate for two reasons. First, traditional GNN explanation methods are designed for node, edge, or graph classification tasks rather than ranking, as in recommender systems. Second, standard machine learning explanations are usually intended to support skilled decision-makers. Instead, recommendations are designed for any end-user, and thus their explanations should be provided in user-understandable ways. In this work, we propose GREASE, a novel method for explaining the suggestions provided by any black-box GNN-based recommender system. Specifically, GREASE first trains a surrogate model on a target user-item pair and its $l$-hop neighborhood. Then, it generates both factual and counterfactual explanations by finding optimal adjacency matrix perturbations to capture the sufficient and necessary conditions for an item to be recommended, respectively. Experimental results conducted on real-world datasets demonstrate that GREASE can generate concise and effective explanations for popular GNN-based recommender models.

CVJun 16, 2022
DreamNet: A Deep Riemannian Network based on SPD Manifold Learning for Visual Classification

Rui Wang, Xiao-Jun Wu, Ziheng Chen et al.

Image set-based visual classification methods have achieved remarkable performance, via characterising the image set in terms of a non-singular covariance matrix on a symmetric positive definite (SPD) manifold. To adapt to complicated visual scenarios better, several Riemannian networks (RiemNets) for SPD matrix nonlinear processing have recently been studied. However, it is pertinent to ask, whether greater accuracy gains can be achieved by simply increasing the depth of RiemNets. The answer appears to be negative, as deeper RiemNets tend to lose generalization ability. To explore a possible solution to this issue, we propose a new architecture for SPD matrix learning. Specifically, to enrich the deep representations, we adopt SPDNet [1] as the backbone, with a stacked Riemannian autoencoder (SRAE) built on the tail. The associated reconstruction error term can make the embedding functions of both SRAE and of each RAE an approximate identity mapping, which helps to prevent the degradation of statistical information. We then insert several residual-like blocks with shortcut connections to augment the representational capacity of SRAE, and to simplify the training of a deeper network. The experimental evidence demonstrates that our DreamNet can achieve improved accuracy with increased depth of the network.

CLMar 16, 2022
AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension

Xiao Li, Gong Cheng, Ziheng Chen et al.

Recent machine reading comprehension datasets such as ReClor and LogiQA require performing logical reasoning over text. Conventional neural models are insufficient for logical reasoning, while symbolic reasoners cannot directly apply to text. To meet the challenge, we present a neural-symbolic approach which, to predict an answer, passes messages over a graph representing logical relations between text units. It incorporates an adaptive logic graph network (AdaLoGN) which adaptively infers logical relations to extend the graph and, essentially, realizes mutual and iterative reinforcement between neural and symbolic reasoning. We also implement a novel subgraph-to-node message passing mechanism to enhance context-option interaction for answering multiple-choice questions. Our approach shows promising results on ReClor and LogiQA.

CVMar 13
MIBench: Evaluating LMMs on Multimodal Interaction

Yu Miao, Zequn Yang, Yake Wei et al. · pku

In different multimodal scenarios, it needs to integrate and utilize information across modalities in a specific way based on the demands of the task. Different integration ways between modalities are referred to as "multimodal interaction". How well a model handles various multimodal interactions largely characterizes its multimodal ability. In this paper, we introduce MIBench, a comprehensive benchmark designed to evaluate the multimodal interaction capabilities of Large Multimodal Models (LMMs), which formulates each instance as a (con_v , con_t, task) triplet with contexts from vision and text, necessitating that LMMs employ correct forms of multimodal interaction to effectively complete the task. MIBench assesses models from three key aspects: the ability to source information from vision-centric or text-centric cues, and the ability to generate new information from their joint synergy. Each interaction capability is evaluated hierarchically across three cognitive levels: Recognition, Understanding, and Reasoning. MIBench comprises over 10,000 vision-text context pairs spanning 32 distinct tasks. Evaluation of state-of-the-art LMMs show that: (1) LMMs' ability on multimodal interaction remains constrained, despite the scaling of model parameters and training data; (2) they are easily distracted by textual modalities when processing vision information; (3) they mostly possess a basic capacity for multimodal synergy; and (4) natively trained multimodal models show noticeable deficits in fundamental interaction ability. We expect that these observations can serve as a reference for developing LMMs with more enhanced multimodal ability in the future.

LGMar 26, 2023
Adaptive Log-Euclidean Metrics for SPD Matrix Learning

Ziheng Chen, Yue Song, Tianyang Xu et al.

Symmetric Positive Definite (SPD) matrices have received wide attention in machine learning due to their intrinsic capacity to encode underlying structural correlation in data. Many successful Riemannian metrics have been proposed to reflect the non-Euclidean geometry of SPD manifolds. However, most existing metric tensors are fixed, which might lead to sub-optimal performance for SPD matrix learning, especially for deep SPD neural networks. To remedy this limitation, we leverage the commonly encountered pullback techniques and propose Adaptive Log-Euclidean Metrics (ALEMs), which extend the widely used Log-Euclidean Metric (LEM). Compared with the previous Riemannian metrics, our metrics contain learnable parameters, which can better adapt to the complex dynamics of Riemannian neural networks with minor extra computations. We also present a complete theoretical analysis to support our ALEMs, including algebraic and Riemannian properties. The experimental and theoretical results demonstrate the merit of the proposed metrics in improving the performance of SPD neural networks. The efficacy of our metrics is further showcased on a set of recently developed Riemannian building blocks, including Riemannian batch normalization, Riemannian Residual blocks, and Riemannian classifiers.

LGMay 25
Relative Repairability: A Calibration-Based Diagnostic for High-Sparsity Post-Pruning Allocation

Qishi Zhan, Liang He, Minxuan Hu et al.

At very high sparsity, neural network pruning does more than decide which weights remain. It also determines where pruning induced damage is placed across the network, and whether that damage can be recovered by a fixed lightweight repair procedure. We study this problem through the lens of repair conditioned sparsity allocation. We introduce Relative Repairability (RR), a calibration based diagnostic that compares the raw activation distortion caused by layerwise pruning with the residual distortion left after channelwise variance matching repair. RR estimates the fraction of local damage that remains after repair, using only unlabeled calibration data. Across ResNet18, ResNet34, and VGG16 BN on CIFAR10 and CIFAR100, we find that RR is not a universally dominant allocation rule. Instead, it is most useful near an architecture dependent recoverability transition, where standard structural or magnitude based allocation priors begin to lose reliability but post repair recovery has not yet fully collapsed. On CIFAR100 ResNet18, a fine grained sweep shows that RR improves over ERK across the central transition band and surpasses LAMP near the upper part of this band. A projection forced ablation further shows that capped ERK can over protect projection layers, shifting excessive sparsity onto regular convolutions and reducing post repair recovery. These results suggest that high sparsity pruning should allocate not only retained weights, but also repairable damage.

LGSep 28, 2024
RMLR: Extending Multinomial Logistic Regression into General Geometries

Ziheng Chen, Yue Song, Rui Wang et al.

Riemannian neural networks, which extend deep learning techniques to Riemannian spaces, have gained significant attention in machine learning. To better classify the manifold-valued features, researchers have started extending Euclidean multinomial logistic regression (MLR) into Riemannian manifolds. However, existing approaches suffer from limited applicability due to their strong reliance on specific geometric properties. This paper proposes a framework for designing Riemannian MLR over general geometries, referred to as RMLR. Our framework only requires minimal geometric properties, thus exhibiting broad applicability and enabling its use with a wide range of geometries. Specifically, we showcase our framework on the Symmetric Positive Definite (SPD) manifold and special orthogonal group, i.e., the set of rotation matrices. On the SPD manifold, we develop five families of SPD MLRs under five types of power-deformed metrics. On rotation matrices we propose Lie MLR based on the popular bi-invariant metric. Extensive experiments on different Riemannian backbone networks validate the effectiveness of our framework.

PRMay 11, 2022Code
RLOP: RL Methods in Option Pricing from a Mathematical Perspective

Ziheng Chen

Abstract In this work, we build two environments, namely the modified QLBS and RLOP models, from a mathematics perspective which enables RL methods in option pricing through replicating by portfolio. We implement the environment specifications (the source code can be found at https://github.com/owen8877/RLOP), the learning algorithm, and agent parametrization by a neural network. The learned optimal hedging strategy is compared against the BS prediction. The effect of various factors is considered and studied based on how they affect the optimal price and position.

LGMay 20
Adaptive Signal Resuscitation: Channel-wise Post-Pruning Repair for Sparse Vision Networks

Qishi Zhan, Ziheng Chen, Minxuan Hu

One-shot magnitude pruning can cause severe accuracy collapse in the high-sparsity regime, even when the pruning mask preserves the largest weights. We argue that this failure reflects a granularity mismatch in post-pruning repair. Under global magnitude pruning, nearly collapsed channels can coexist with channels that retain informative activation variance within the same layer. Existing layer-wise activation repair methods apply a single correction to the whole layer, and can therefore over-amplify damaged channels while trying to restore the layer-level signal. We propose Adaptive Signal Resuscitation (ASR), a training-free channel-wise repair method that matches the granularity of repair to the granularity of damage. ASR estimates a variance-matching correction for each output channel and stabilizes it with a data-driven shrinkage rule, suppressing unreliable corrections for channels with weak post-pruning signal while preserving corrections for healthier channels. Applied before BatchNorm recalibration, ASR requires only forward passes on a small calibration set and no retraining. Across three datasets, four convolutional architectures, and both unstructured and structured sparsity settings, ASR generally improves over layer-wise repair, with the clearest gains in high-sparsity regimes. On ResNet-50 at 90% sparsity, ASR recovers 55.6% top-1 accuracy on CIFAR-10, compared with 41.0% for layer-wise repair and 28.0% for BatchNorm-only recalibration. Ablations show that naive channel-wise variance matching is insufficient, and that shrinkage stabilizes post-pruning repair.

CVDec 10, 2025
Wasserstein-Aligned Hyperbolic Multi-View Clustering

Rui Wang, Yuting Jiang, Xiaoqing Luo et al.

Multi-view clustering (MVC) aims to uncover the latent structure of multi-view data by learning view-common and view-specific information. Although recent studies have explored hyperbolic representations for better tackling the representation gap between different views, they focus primarily on instance-level alignment and neglect global semantic consistency, rendering them vulnerable to view-specific information (\textit{e.g.}, noise and cross-view discrepancies). To this end, this paper proposes a novel Wasserstein-Aligned Hyperbolic (WAH) framework for multi-view clustering. Specifically, our method exploits a view-specific hyperbolic encoder for each view to embed features into the Lorentz manifold for hierarchical semantic modeling. Whereafter, a global semantic loss based on the hyperbolic sliced-Wasserstein distance is introduced to align manifold distributions across views. This is followed by soft cluster assignments to encourage cross-view semantic consistency. Extensive experiments on multiple benchmarking datasets show that our method can achieve SOTA clustering performance.

LGMay 18
Riemannian Networks over Full-Rank Correlation Matrices

Ziheng Chen, Xiaojun Wu, Bernhard Schölkopf et al.

Representations on the Symmetric Positive Definite (SPD) manifold have garnered significant attention across different applications. In contrast, the manifold of full-rank correlation matrices, a normalized alternative to SPD matrices, remains largely underexplored. This paper introduces Riemannian networks over the correlation manifold, leveraging five recently developed correlation geometries. We systematically extend basic layers, including Multinomial Logistic Regression (MLR), Fully Connected (FC), and convolutional layers, to these geometries. Besides, we present methods for accurate backpropagation for two correlation geometries. Experiments comparing our approach against existing SPD and Grassmannian networks demonstrate its effectiveness.

IRApr 6
CRAB: Codebook Rebalancing for Bias Mitigation in Generative Recommendation

Zezhong Fan, Ziheng Chen, Luyi Ma et al.

Generative recommendation (GeneRec) has introduced a new paradigm that represents items as discrete semantic tokens and predicts items in a generative manner. Despite its strong performance across multiple recommendation tasks, existing GeneRec approaches still suffer from severe popularity bias and may even exacerbate it. In this work, we conduct a comprehensive empirical analysis to uncover the root causes of this phenomenon, yielding two core insights: 1) imbalanced tokenization inherits and can further amplify popularity bias from historical item interactions; 2) current training procedures disproportionately favor popular tokens while neglecting semantic relationships among tokens, thereby intensifying popularity bias. Building on these insights, we propose CRAB, a post-hoc debiasing strategy for GeneRec that alleviates popularity bias by mitigating frequency imbalance among semantic tokens. Specifically, given a well-trained model, we first rebalance the codebook by splitting over-popular tokens while preserving their hierarchical semantic structure. Based on the adjusted codebook, we further introduce a tree-structured regularizer to enhance semantic consistency, encouraging more informative representations for unpopular tokens during training. Experiments on real-world datasets demonstrate that CRAB significantly improves recommendation performance by effectively alleviating popularity bias.

LGMar 17, 2024Code
A Lie Group Approach to Riemannian Batch Normalization

Ziheng Chen, Yue Song, Yunmei Liu et al.

Manifold-valued measurements exist in numerous applications within computer vision and machine learning. Recent studies have extended Deep Neural Networks (DNNs) to manifolds, and concomitantly, normalization techniques have also been adapted to several manifolds, referred to as Riemannian normalization. Nonetheless, most of the existing Riemannian normalization methods have been derived in an ad hoc manner and only apply to specific manifolds. This paper establishes a unified framework for Riemannian Batch Normalization (RBN) techniques on Lie groups. Our framework offers the theoretical guarantee of controlling both the Riemannian mean and variance. Empirically, we focus on Symmetric Positive Definite (SPD) manifolds, which possess three distinct types of Lie group structures. Using the deformation concept, we generalize the existing Lie groups on SPD manifolds into three families of parameterized Lie groups. Specific normalization layers induced by these Lie groups are then proposed for SPD neural networks. We demonstrate the effectiveness of our approach through three sets of experiments: radar recognition, human action recognition, and electroencephalography (EEG) classification. The code is available at https://github.com/GitZH-Chen/LieBN.git.

LGJan 14
Toward Understanding Unlearning Difficulty: A Mechanistic Perspective and Circuit-Guided Difficulty Metric

Jiali Cheng, Ziheng Chen, Chirag Agarwal et al.

Machine unlearning is becoming essential for building trustworthy and compliant language models. Yet unlearning success varies considerably across individual samples: some are reliably erased, while others persist despite the same procedure. We argue that this disparity is not only a data-side phenomenon, but also reflects model-internal mechanisms that encode and protect memorized information. We study this problem from a mechanistic perspective based on model circuits--structured interaction pathways that govern how predictions are formed. We propose Circuit-guided Unlearning Difficulty (CUD), a {\em pre-unlearning} metric that assigns each sample a continuous difficulty score using circuit-level signals. Extensive experiments demonstrate that CUD reliably separates intrinsically easy and hard samples, and remains stable across unlearning methods. We identify key circuit-level patterns that reveal a mechanistic signature of difficulty: easy-to-unlearn samples are associated with shorter, shallower interactions concentrated in earlier-to-intermediate parts of the original model, whereas hard samples rely on longer and deeper pathways closer to late-stage computation. Compared to existing qualitative studies, CUD takes a first step toward a principled, fine-grained, and interpretable analysis of unlearning difficulty; and motivates the development of unlearning methods grounded in model mechanisms.

CLApr 15
YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference

You Wu, Ziheng Chen, Yizhen Zhang et al.

Cross-layer key-value (KV) compression has been found to be effective in efficient inference of large language models (LLMs). Although they reduce the memory consumption of the KV cache, such methods usually introduce non-negligible performance degradation. In this work, we aim to enhance the performance of YOCO, a cross-layer KV compression method that shares the KVs of the middle layer with the top-half layers. We propose YOCO++, an enhanced YOCO that incorporates a weighted residual connection between the KVs of each bottom-half layer and the bottom layer. Compared to YOCO, YOCO++ increases model capacity while maintaining the same training and inference efficiency. Our experiments show that YOCO++ achieves state-of-the-art performance among the cross-layer KV compression methods at a 50% KV cache compression rate, outperforming the standard Transformer.

IRApr 4
CURE:Circuit-Aware Unlearning for LLM-based Recommendation

Ziheng Chen, Jiali Cheng, Zezhong Fan et al.

Recent advances in large language models (LLMs) have opened new opportunities for recommender systems by enabling rich semantic understanding and reasoning about user interests and item attributes. However, as privacy regulations tighten, incorporating user data into LLM-based recommendation (LLMRec) introduces significant privacy risks, making unlearning algorithms increasingly crucial for practical deployment. Despite growing interest in LLMRec unlearning, most existing approaches formulate unlearning as a weighted combination of forgetting and retaining objectives while updating model parameters in a uniform manner. Such formulations inevitably induce gradient conflicts between the two objectives, leading to unstable optimization and resulting in either ineffective unlearning or severe degradation of model utility. Moreover, the unlearning procedure remains largely black-box, undermining its transparency and trustworthiness. To tackle these challenges, we propose CURE, a circuit-aware unlearning framework that disentangles model components into functionally distinct subsets and selectively updates them. Here, a circuit refers to a computational subgraph that is causally responsible for task-specific behaviors. Specifically, we extract the core circuits underlying item recommendation and analyze how individual modules within these circuits contribute to the forget and retain objectives. Based on this analysis, these modules are categorized into forget-specific, retain-specific, and task-shared groups, each subject to function-specific update rules to mitigate gradient conflicts during unlearning. Experiments on real-world datasets show that our approach achieves more effective unlearning than existing baselines.

LGFeb 21Code
Hyperbolic Busemann Neural Networks

Ziheng Chen, Bernhard Schölkopf, Nicu Sebe

Hyperbolic spaces provide a natural geometry for representing hierarchical and tree-structured data due to their exponential volume growth. To leverage these benefits, neural networks require intrinsic and efficient components that operate directly in hyperbolic space. In this work, we lift two core components of neural networks, Multinomial Logistic Regression (MLR) and Fully Connected (FC) layers, into hyperbolic space via Busemann functions, resulting in Busemann MLR (BMLR) and Busemann FC (BFC) layers with a unified mathematical interpretation. BMLR provides compact parameters, a point-to-horosphere distance interpretation, batch-efficient computation, and a Euclidean limit, while BFC generalizes FC and activation layers with comparable complexity. Experiments on image classification, genome sequence learning, node classification, and link prediction demonstrate improvements in effectiveness and efficiency over prior hyperbolic layers. The code is available at https://github.com/GitZH-Chen/HBNN.

LGSep 8, 2025Code
Riemannian Batch Normalization: A Gyro Approach

Ziheng Chen, Xiao-Jun Wu, Bernhard Schölkopf et al.

Normalization layers are crucial for deep learning, but their Euclidean formulations are inadequate for data on manifolds. On the other hand, many Riemannian manifolds in machine learning admit gyro-structures, enabling principled extensions of Euclidean neural networks to non-Euclidean domains. Inspired by this, we introduce GyroBN, a principled Riemannian batch normalization framework for gyrogroups. We establish two necessary conditions, namely \emph{pseudo-reduction} and \emph{gyroisometric gyrations}, that guarantee GyroBN with theoretical control over sample statistics, and show that these conditions hold for all known gyrogroups in machine learning. Our framework also incorporates several existing Riemannian normalization methods as special cases. We further instantiate GyroBN on seven representative geometries, including the Grassmannian, five constant curvature spaces, and the correlation manifold, and derive novel gyro and Riemannian structures to enable these instantiations. Experiments across these geometries demonstrate the effectiveness of GyroBN. The code is available at https://github.com/GitZH-Chen/GyroBN.git.

LGMay 18, 2023Code
Riemannian Multinomial Logistics Regression for SPD Neural Networks

Ziheng Chen, Yue Song, Gaowen Liu et al.

Deep neural networks for learning Symmetric Positive Definite (SPD) matrices are gaining increasing attention in machine learning. Despite the significant progress, most existing SPD networks use traditional Euclidean classifiers on an approximated space rather than intrinsic classifiers that accurately capture the geometry of SPD manifolds. Inspired by Hyperbolic Neural Networks (HNNs), we propose Riemannian Multinomial Logistics Regression (RMLR) for the classification layers in SPD networks. We introduce a unified framework for building Riemannian classifiers under the metrics pulled back from the Euclidean space, and showcase our framework under the parameterized Log-Euclidean Metric (LEM) and Log-Cholesky Metric (LCM). Besides, our framework offers a novel intrinsic explanation for the most popular LogEig classifier in existing SPD networks. The effectiveness of our method is demonstrated in three applications: radar recognition, human action recognition, and electroencephalography (EEG) classification. The code is available at https://github.com/GitZH-Chen/SPDMLR.git.

CVJan 25, 2022Code
Riemannian Local Mechanism for SPD Neural Networks

Ziheng Chen, Tianyang Xu, Xiao-Jun Wu et al.

The Symmetric Positive Definite (SPD) matrices have received wide attention for data representation in many scientific areas. Although there are many different attempts to develop effective deep architectures for data processing on the Riemannian manifold of SPD matrices, very few solutions explicitly mine the local geometrical information in deep SPD feature representations. Given the great success of local mechanisms in Euclidean methods, we argue that it is of utmost importance to ensure the preservation of local geometric information in the SPD networks. We first analyse the convolution operator commonly used for capturing local information in Euclidean deep networks from the perspective of a higher level of abstraction afforded by category theory. Based on this analysis, we define the local information in the SPD manifold and design a multi-scale submanifold block for mining local geometry. Experiments involving multiple visual tasks validate the effectiveness of our approach. The supplement and source code can be found in https://github.com/GitZH-Chen/MSNet.git.

NAMar 28
Weak convergence order of stochastic theta method for SDEs driven by time-changed Lévy noise

Ziheng Chen, Jiao Liu, Meng Cai

This paper studies the weak convergence order of the stochastic theta method for stochastic differential equations (SDEs) driven by time-changed Lévy noise under global Lipschitz and linear growth conditions. In contrast to classical Lévy-driven SDEs, the presence of a random time change makes the weak error analysis involve both the discretization error of the underlying equation and the approximation error of the random clock. Moreover, compared with explicit Euler--Maruyama method, the implicit drift correction in the stochastic theta method makes the associated weak error analysis substantially more delicate. To address these difficulties, we first establish a global weak convergence estimate of order one for the stochastic theta method applied to the corresponding non-time-changed Lévy SDEs on the infinite time interval by means of the Kolmogorov backward partial integro differential equations. Incorporating the approximation of the inverse subordinator together with the duality principle, we derive the weak convergence order of the stochastic theta method with $θ\in [0,1]$ in the time-changed Lévy setting. The result advances the currently available weak convergence analysis beyond the Euler--Maruyama method to the more general class of stochastic theta method, and establishes a workable route from the weak analysis of the underlying non-time-changed Lévy equation to the corresponding time-changed problem. Finally, numerical experiments are presented to further support the theoretical findings.

LGFeb 12
SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion

Chengting Yu, Xiaobo Shu, Yadao Wang et al.

Recursive (looped) Transformers decouple computational depth from parameter depth by repeatedly applying shared layers, providing an explicit architectural primitive for iterative refinement and latent reasoning. However, early looped Transformers often underperform non-recursive baselines of equal compute. While recent literature has introduced more effective recursion mechanisms to mitigate this gap, existing architectures still operate at a fixed, full-token resolution, neglecting the potential efficiency of computing over compressed latent representations. In this paper, we propose SpiralFormer, a looped Transformer that executes recurrence under a multi-resolution recursion schedule. We provide probing evidence that multi-resolution recursion enables the model to learn hierarchical dependencies by inducing iteration-wise functional specialization across different scales. Empirically, SpiralFormer achieves better parameter and compute efficiency than both looped and non-looped baselines across model scales from 160M to 1.4B, establishing sequence resolution as a potential axis for scaling recursive architectures.

DCMar 14, 2025
Characterizing GPU Resilience and Impact on AI/HPC Systems

Shengkun Cui, Archit Patke, Hung Nguyen et al.

This study characterizes GPU resilience in Delta HPC, a large-scale AI system that consists of 1,056 A100 and H100 GPUs, with over 1,300 petaflops of peak throughput. Delta HPC is operated by the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign. We used 2.5 years of operational data (11.7 million GPU hours) on GPU errors. Our major findings include: (i) H100 GPU memory resilience is worse than A100 GPU memory, with 3.2x lower per-GPU MTBE for memory errors, (ii) The GPU memory error-recovery mechanisms on H100 GPUs are insufficient to handle the increased memory capacity, (iii) H100 GPUs demonstrate significantly improved GPU hardware resilience over A100 GPUs with respect to critical hardware components, (iv) GPU errors on both A100 and H100 GPUs frequently result in job failures due to the lack of robust recovery mechanisms at the application level, and (v) We project the impact of GPU node availability on larger-scales and find that significant overprovisioning of 5% is necessary to handle GPU failures.

LGApr 24, 2024
Debiasing Machine Unlearning with Counterfactual Examples

Ziheng Chen, Jia Wang, Jun Zhuang et al.

The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1) data-level bias, characterized by uneven data removal, and (2) algorithm-level bias, which leads to the contamination of the remaining dataset, thereby degrading model accuracy. In this work, we analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels. Typically, we introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset. Besides, we guide the forgetting procedure by leveraging counterfactual examples, as they maintain semantic data consistency without hurting performance on the remaining dataset. Experimental results demonstrate that our method outperforms existing machine unlearning baselines on evaluation metrics.

LGApr 1, 2025
Learning to Normalize on the SPD Manifold under Bures-Wasserstein Geometry

Rui Wang, Shaocheng Jin, Ziheng Chen et al.

Covariance matrices have proven highly effective across many scientific fields. Since these matrices lie within the Symmetric Positive Definite (SPD) manifold - a Riemannian space with intrinsic non-Euclidean geometry, the primary challenge in representation learning is to respect this underlying geometric structure. Drawing inspiration from the success of Euclidean deep learning, researchers have developed neural networks on the SPD manifolds for more faithful covariance embedding learning. A notable advancement in this area is the implementation of Riemannian batch normalization (RBN), which has been shown to improve the performance of SPD network models. Nonetheless, the Riemannian metric beneath the existing RBN might fail to effectively deal with the ill-conditioned SPD matrices (ICSM), undermining the effectiveness of RBN. In contrast, the Bures-Wasserstein metric (BWM) demonstrates superior performance for ill-conditioning. In addition, the recently introduced Generalized BWM (GBWM) parameterizes the vanilla BWM via an SPD matrix, allowing for a more nuanced representation of vibrant geometries of the SPD manifold. Therefore, we propose a novel RBN algorithm based on the GBW geometry, incorporating a learnable metric parameter. Moreover, the deformation of GBWM by matrix power is also introduced to further enhance the representational capacity of GBWM-based RBN. Experimental results on different datasets validate the effectiveness of our proposed method.

LGMar 23, 2025
FROG: Fair Removal on Graphs

Ziheng Chen, Jiali Cheng, Hadi Amiri et al.

With growing emphasis on privacy regulations, machine unlearning has become increasingly critical in real-world applications such as social networks and recommender systems, many of which are naturally represented as graphs. However, existing graph unlearning methods often modify nodes or edges indiscriminately, overlooking their impact on fairness. For instance, forgetting links between users of different genders may inadvertently exacerbate group disparities. To address this issue, we propose a novel framework that jointly optimizes both the graph structure and the model to achieve fair unlearning. Our method rewires the graph by removing redundant edges that hinder forgetting while preserving fairness through targeted edge augmentation. We further introduce a worst-case evaluation mechanism to assess robustness under challenging scenarios. Experiments on real-world datasets show that our approach achieves more effective and fair unlearning than existing baselines.

LGAug 28, 2025
FUTURE: Flexible Unlearning for Tree Ensemble

Ziheng Chen, Jin Huang, Jiali Cheng et al.

Tree ensembles are widely recognized for their effectiveness in classification tasks, achieving state-of-the-art performance across diverse domains, including bioinformatics, finance, and medical diagnosis. With increasing emphasis on data privacy and the \textit{right to be forgotten}, several unlearning algorithms have been proposed to enable tree ensembles to forget sensitive information. However, existing methods are often tailored to a particular model or rely on the discrete tree structure, making them difficult to generalize to complex ensembles and inefficient for large-scale datasets. To address these limitations, we propose FUTURE, a novel unlearning algorithm for tree ensembles. Specifically, we formulate the problem of forgetting samples as a gradient-based optimization task. In order to accommodate non-differentiability of tree ensembles, we adopt the probabilistic model approximations within the optimization framework. This enables end-to-end unlearning in an effective and efficient manner. Extensive experiments on real-world datasets show that FUTURE yields significant and successful unlearning performance.

PRJan 5
Reinforcement Learning for Option Hedging: Static Implied-Volatility Fit versus Shortfall-Aware Performance

Ziheng Chen, Minxuan Hu, Jiayu Yi et al.

We extend the Q-learner in Black-Scholes (QLBS) framework by incorporating risk aversion and trading costs, and propose a novel Replication Learning of Option Pricing (RLOP) approach. Both methods are fully compatible with standard reinforcement learning algorithms and operate under market frictions. Using SPY and XOP option data, we evaluate performance along static and dynamic dimensions. Adaptive-QLBS achieves higher static pricing accuracy in implied volatility space, while RLOP delivers superior dynamic hedging performance by reducing shortfall probability. These results highlight the importance of evaluating option pricing models beyond static fit, emphasizing realized hedging outcomes.

LGOct 9, 2025
MeSH: Memory-as-State-Highways for Recursive Transformers

Chengting Yu, Xiaobo Shu, Yadao Wang et al.

Recursive transformers reuse parameters and iterate over hidden states multiple times, decoupling compute depth from parameter depth. However, under matched compute, recursive models with fewer parameters often lag behind non-recursive counterparts. By probing hidden states, we trace this performance gap to two primary bottlenecks: undifferentiated computation, where the core is forced to adopt a similar computational pattern at every iteration, and information overload, where long-lived and transient information must coexist in a single hidden state. To address the issues, we introduce a Memory-as-State-Highways (MeSH) scheme, which externalizes state management into an explicit memory buffer and employs lightweight routers to dynamically diversify computation across iterations. Probing visualizations confirm that MeSH successfully resolves the pathologies by inducing functional specialization across iterations. On the Pythia suite (160M-1.4B), MeSH-enhanced recursive transformers consistently improve over recursive baselines and outperforms its larger non-recursive counterpart at the 1.4B scale, improving average downstream accuracy by +1.06% with 33% fewer non-embedding parameters. Our analysis establishes MeSH as a scalable and principled architecture for building stronger recursive models.

LGOct 22, 2021
ReLAX: Reinforcement Learning Agent eXplainer for Arbitrary Predictive Models

Ziheng Chen, Fabrizio Silvestri, Jia Wang et al.

Counterfactual examples (CFs) are one of the most popular methods for attaching post-hoc explanations to machine learning (ML) models. However, existing CF generation methods either exploit the internals of specific models or depend on each sample's neighborhood, thus they are hard to generalize for complex models and inefficient for large datasets. This work aims to overcome these limitations and introduces ReLAX, a model-agnostic algorithm to generate optimal counterfactual explanations. Specifically, we formulate the problem of crafting CFs as a sequential decision-making task and then find the optimal CFs via deep reinforcement learning (DRL) with discrete-continuous hybrid action space. Extensive experiments conducted on several tabular datasets have shown that ReLAX outperforms existing CF generation baselines, as it produces sparser counterfactuals, is more scalable to complex target models to explain, and generalizes to both classification and regression tasks. Finally, to demonstrate the usefulness of our method in a real-world use case, we leverage CFs generated by ReLAX to suggest actions that a country should take to reduce the risk of mortality due to COVID-19. Interestingly enough, the actions recommended by our method correspond to the strategies that many countries have actually implemented to counter the COVID-19 pandemic.

LGFeb 14, 2021
Costly Features Classification using Monte Carlo Tree Search

Ziheng Chen, Jin Huang, Hongshik Ahn et al.

We consider the problem of costly feature classification, where we sequentially select the subset of features to make a balance between the classification error and the feature cost. In this paper, we first cast the task into a MDP problem and use Advantage Actor Critic algorithm to solve it. In order to further improve the agent's performance and make the policy explainable, we employ the Monte Carlo Tree Search to update the policy iteratively. During the procedure, we also consider its performance on the unbalanced dataset and its sensitivity to the missing value. We evaluate our model on multiple datasets and find it outperforms other methods.

MLNov 11, 2019
Item Response Theory based Ensemble in Machine Learning

Ziheng Chen, Hongshik Ahn

In this article, we propose a novel probabilistic framework to improve the accuracy of a weighted majority voting algorithm. In order to assign higher weights to the classifiers which can correctly classify hard-to-classify instances, we introduce the Item Response Theory (IRT) framework to evaluate the samples' difficulty and classifiers' ability simultaneously. Three models are created with different assumptions suitable for different cases. When making an inference, we keep a balance between the accuracy and complexity. In our experiment, all the base models are constructed by single trees via bootstrap. To explain the models, we illustrate how the IRT ensemble model constructs the classifying boundary. We also compare their performance with other widely used methods and show that our model performs well on 19 datasets.