Xilin Li

SD
h-index2
5papers
46citations
Novelty43%
AI Score43

5 Papers

MLNov 8, 2022
Black Box Lie Group Preconditioners for SGD

Xilin Li

A matrix free and a low rank approximation preconditioner are proposed to accelerate the convergence of stochastic gradient descent (SGD) by exploiting curvature information sampled from Hessian-vector products or finite differences of parameters and gradients similar to the BFGS algorithm. Both preconditioners are fitted with an online updating manner minimizing a criterion that is free of line search and robust to stochastic gradient noise, and further constrained to be on certain connected Lie groups to preserve their corresponding symmetry or invariance, e.g., orientation of coordinates by the connected general linear group with positive determinants. The Lie group's equivariance property facilitates preconditioner fitting, and its invariance property saves any need of damping, which is common in second-order optimizers, but difficult to tune. The learning rate for parameter updating and step size for preconditioner fitting are naturally normalized, and their default values work well in most situations.

SYMar 15
DRCC-LPVMPC: Robust Data-Driven Control for Autonomous Driving and Obstacle Avoidance

Shiming Fang, Xilin Li, Changzhi Wu et al.

Safety in obstacle avoidance is critical for autonomous driving. While model predictive control (MPC) is widely used, simplified prediction models such as linearized or single-track vehicle models introduce discrepancies between predicted and actual behavior that can compromise safety. This paper proposes a distributionally robust chance-constrained linear parameter-varying MPC (DRCC-LPVMPC) framework that explicitly accounts for such discrepancies. The single-track vehicle dynamics are represented in a quasi-linear parameter-varying (quasi-LPV) form, with model mismatches treated as additive uncertainties of unknown distribution. By constructing chance constraints from finite sampled data and employing a Wasserstein ambiguity set, the proposed method avoids restrictive assumptions on boundedness or Gaussian distributions. The resulting DRCC problem is reformulated as tractable convex constraints and solved in real time using a quadratic programming solver. Recursive feasibility of the approach is formally established. Simulation and real-world experiments demonstrate that DRCC-LPVMPC maintains safer obstacle clearance and more reliable tracking than conventional nonlinear MPC and LPVMPC controllers under significant uncertainties.

SDOct 4, 2025
Lightweight and Generalizable Acoustic Scene Representations via Contrastive Fine-Tuning and Distillation

Kuang Yuan, Yang Gao, Xilin Li et al.

Acoustic scene classification (ASC) models on edge devices typically operate under fixed class assumptions, lacking the transferability needed for real-world applications that require adaptation to new or refined acoustic categories. We propose ContrastASC, which learns generalizable acoustic scene representations by structuring the embedding space to preserve semantic relationships between scenes, enabling adaptation to unseen categories without retraining. Our approach combines supervised contrastive fine-tuning of pre-trained models with contrastive representation distillation to transfer this structured knowledge to compact student models. Our evaluation shows that ContrastASC demonstrates improved few-shot adaptation to unseen categories while maintaining strong closed-set performance.

SDSep 23, 2025
ArtiFree: Detecting and Reducing Generative Artifacts in Diffusion-based Speech Enhancement

Bhawana Chhaglani, Yang Gao, Julius Richter et al.

Diffusion-based speech enhancement (SE) achieves natural-sounding speech and strong generalization, yet suffers from key limitations like generative artifacts and high inference latency. In this work, we systematically study artifact prediction and reduction in diffusion-based SE. We show that variance in speech embeddings can be used to predict phonetic errors during inference. Building on these findings, we propose an ensemble inference method guided by semantic consistency across multiple diffusion runs. This technique reduces WER by 15% in low-SNR conditions, effectively improving phonetic accuracy and semantic plausibility. Finally, we analyze the effect of the number of diffusion steps, showing that adaptive diffusion steps balance artifact suppression and latency. Our findings highlight semantic priors as a powerful tool to guide generative SE toward artifact-free outputs.

CVSep 10, 2018
Hand-tremor frequency estimation in videos

Silvia L. Pintea, Jian Zheng, Xilin Li et al.

We focus on the problem of estimating human hand-tremor frequency from input RGB video data. Estimating tremors from video is important for non-invasive monitoring, analyzing and diagnosing patients suffering from motor-disorders such as Parkinson's disease. We consider two approaches for hand-tremor frequency estimation: (a) a Lagrangian approach where we detect the hand at every frame in the video, and estimate the tremor frequency along the trajectory; and (b) an Eulerian approach where we first localize the hand, we subsequently remove the large motion along the movement trajectory of the hand, and we use the video information over time encoded as intensity values or phase information to estimate the tremor frequency. We estimate hand tremors on a new human tremor dataset, TIM-Tremor, containing static tasks as well as a multitude of more dynamic tasks, involving larger motion of the hands. The dataset has 55 tremor patient recordings together with: associated ground truth accelerometer data from the most affected hand, RGB video data, and aligned depth data.