Ruiyi Zhan

CV
h-index4
4papers
10citations
Novelty51%
AI Score51

4 Papers

8.6DCMay 26
Reducing Internal State in Eigenvalue-Only Divide-and-Conquer Tridiagonal Eigensolvers

Ruiyi Zhan, Shaoshuai Zhang

Divide and Conquer (D&C) is a widely used algorithmic strategy for symmetric eigenvalue decomposition. Its natural parallelism makes D&C attractive on modern multicore CPUs and GPUs, but existing eigenvalue-only routines often default to QR-based methods because conventional D&C still materializes or replays large transformation matrices during the conquer phase. This paper proposes a boundary-row D&C algorithm for eigenvalue-only computation. The key observation is that the conquer phase only needs selected boundary rows/columns rather than the full accumulated eigenvector matrix. By propagating these boundary rows directly through the recursion, the proposed algorithm reduces the memory requirement from quadratic to linear space while also eliminating unnecessary matrix-vector work in the conventional lazy-replay formulation. We provide the algorithm, its time and space complexity analysis, correctness and stability arguments, optimized CPU and GPU implementations, and an evaluation against QR and D&C routines in standard numerical libraries.

CVMar 6Code
Devil is in Narrow Policy: Unleashing Exploration in Driving VLA Models

Canyu Chen, Yuguang Yang, Zhewen Tan et al.

We identify a fundamental Narrow Policy limitation undermining the performance of autonomous VLA models, where driving Imitation Learning (IL) tends to collapse exploration and limit the potential of subsequent Reinforcement Learning (RL) stages, which often saturate prematurely due to insufficient feedback diversity. Thereby, we propose Curious-VLA, a framework that alleviates the exploit-explore dilemma through a two-stage design. During IL, we introduce a Feasible Trajectory Expansion (FTE) strategy to generate multiple physically valid trajectories and a step-wise normalized trajectory representation to adapt this diverse data. In the RL stage, we present Adaptive Diversity-Aware Sampling (ADAS) that prioritizes high-diversity samples and introduce Spanning Driving Reward (SDR) with a focal style weighting to amplify reward's value span for improving sensitivity to driving quality. On the Navsim benchmark, Curious-VLA achieves SoTA results (PDMS 90.3, EPDMS 85.4) and a Best-of-N PDMS of 94.8, demonstrating its effectiveness in unlocking the exploratory potential of VLA models. Code: https://github.com/Mashiroln/curious_vla.git.

41.3CVMar 25
SilLang: Improving Gait Recognition with Silhouette Language Encoding

Ruiyi Zhan, Guozhen Peng, Canyu Chen et al.

Gait silhouettes, which can be encoded into binary gait codes, are widely adopted to representing motion patterns of pedestrian. Recent approaches commonly leverage visual backbones to encode gait silhouettes, achieving successful performance. However, they primarily focus on continuous visual features, overlooking the discrete nature of binary silhouettes that inherently share a discrete encoding space with natural language. Large Language Models (LLMs) have demonstrated exceptional capability in extracting discriminative features from discrete sequences and modeling long-range dependencies, highlighting their potential to capture temporal motion patterns by identifying subtle variations. Motivated by these observations, we explore bridging binary gait silhouettes and natural language within a binary encoding space. However, the encoding spaces of text tokens and binary gait silhouettes remain misaligned, primarily due to differences in token frequency and density. To address this issue, we propose the Contour-Velocity Tokenizer, which encodes binary gait silhouettes while reshaping their distribution to better align with the text token space. We then establish a dual-branch framework termed Silhouette Language Model, which enhances visual silhouettes by integrating discrete linguistic embeddings derived from LLMs. Implemented on mainstream gait backbones, SilLang consistently improves state-of-the-art methods across SUSTech1K, GREW, and Gait3D.

CVDec 21, 2024
TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models

Haocheng Huang, Jiaxin Chen, Jinyang Guo et al.

Diffusion models have achieved remarkable success in the image and video generation tasks. Nevertheless, they often require a large amount of memory and time overhead during inference, due to the complex network architecture and considerable number of timesteps for iterative diffusion. Recently, the post-training quantization (PTQ) technique has proved a promising way to reduce the inference cost by quantizing the float-point operations to low-bit ones. However, most of them fail to tackle with the large variations in the distribution of activations across distinct channels and timesteps, as well as the inconsistent of input between quantization and inference on diffusion models, thus leaving much room for improvement. To address the above issues, we propose a novel method dubbed Timestep-Channel Adaptive Quantization for Diffusion Models (TCAQ-DM). Specifically, we develop a timestep-channel joint reparameterization (TCR) module to balance the activation range along both the timesteps and channels, facilitating the successive reconstruction procedure. Subsequently, we employ a dynamically adaptive quantization (DAQ) module that mitigate the quantization error by selecting an optimal quantizer for each post-Softmax layers according to their specific types of distributions. Moreover, we present a progressively aligned reconstruction (PAR) strategy to mitigate the bias caused by the input mismatch. Extensive experiments on various benchmarks and distinct diffusion models demonstrate that the proposed method substantially outperforms the state-of-the-art approaches in most cases, especially yielding comparable FID metrics to the full precision model on CIFAR-10 in the W6A6 setting, while enabling generating available images in the W4A4 settings.