AIJul 7, 2024Code
SBoRA: Low-Rank Adaptation with Regional Weight UpdatesLai-Man Po, Yuyang Liu, Haoxuan Wu et al.
This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models that builds upon the pioneering works of Low-Rank Adaptation (LoRA) and Orthogonal Adaptation. SBoRA reduces the number of trainable parameters by half or doubles the rank with the similar number of trainable parameters as LoRA, while improving learning performance. By utilizing orthogonal standard basis vectors to initialize one of the low-rank matrices (either $\mathbf{A}$ or $\mathbf{B}$), SBoRA facilitates regional weight updates and memory-efficient fine-tuning. This results in two variants, SBoRA-FA and SBoRA-FB, where only one of the matrices is updated, leading to a sparse update matrix $\mathrmΔ \mathbf{W}$ with predominantly zero rows or columns. Consequently, most of the fine-tuned model's weights $(\mathbf{W}_0+\mathrmΔ \mathbf{W})$ remain unchanged from the pre-trained weights, akin to the modular organization of the human brain, which efficiently adapts to new tasks. Our empirical results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning. Furthermore, we evaluate the effectiveness of QSBoRA on quantized LLaMA models of varying scales, highlighting its potential for efficient adaptation to new tasks. Code is available at https://github.com/cityuhkai/SBoRA
MLMar 3, 2022
Interpretable Latent Variables in Deep State Space ModelsHaoxuan Wu, David S. Matteson, Martin T. Wells
We introduce a new version of deep state-space models (DSSMs) that combines a recurrent neural network with a state-space framework to forecast time series data. The model estimates the observed series as functions of latent variables that evolve non-linearly through time. Due to the complexity and non-linearity inherent in DSSMs, previous works on DSSMs typically produced latent variables that are very difficult to interpret. Our paper focus on producing interpretable latent parameters with two key modifications. First, we simplify the predictive decoder by restricting the response variables to be a linear transformation of the latent variables plus some noise. Second, we utilize shrinkage priors on the latent variables to reduce redundancy and improve robustness. These changes make the latent variables much easier to understand and allow us to interpret the resulting latent variables as random effects in a linear mixed model. We show through two public benchmark datasets the resulting model improves forecasting performances.
SYMay 19
MagCeptor: Encoding Broadcast-Addressable Logic into Magnetic ReceptorsSishen Yuan, Baijia Liang, Tangyou Liu et al.
Multicellular coordination relies on broadcast-addressable receptors, yet engineered magnetic systems face an addressability bottleneck because global fields intrinsically conflate power and control. Here, we introduce MagCeptors to resolve this by encoding selectivity directly into magnetic topology. Establishing an energetic isomorphism with biological receptors, these arrays utilize local couplings to shape potential landscapes where global field vectors act as spatial keys, triggering deterministic snap-through instabilities. This architecture decouples force from source distance, achieving a density of 385 mN/mm3 (>50-fold increase over prior art). We validate this primitive through signal demultiplexing, embodied sequential logic, and untethered distributed networking. This framework enables distributed systems to orchestrate complex tasks without tethers or electronics, relying solely on the intrinsic logic of matter.
ROMay 8
Anatomical Landmark-Guided Deep Reinforcement Learning for Autonomous Gastric NavigationHaoxuan Wu, Sishen Yuan, Haitao Gao et al.
Wireless capsule endoscopy (WCE) enables painless visualization of the gastrointestinal tract, but its diagnostic potential is limited by incomplete mucosal coverage and poor transferability of existing navigation methods across patient anatomies. We propose a transferable, anatomical landmarkguided deep reinforcement learning (AL-DRL) framework for autonomous gastric navigation. Leveraging a lightweight edgecontour-depth fusion module, our policy operates on stable, lowdimensional landmark coordinates rather than high-dimensional video streams, effectively bridging the sim-to-real gap. In simulations across eight patient-derived models, the method achieves over 97% coverage within 50 seconds, significantly outperforming vanilla PPO, SAC, and DQN agents. A two-stage sim-to-real pipeline with an adaptive dynamic programming controller actively mitigates physical disturbances. Ex-vivo experiments demonstrate a mean coverage of 87% and a 53% reduction in procedure time compared with expert manual control.
STJun 13, 2021
A News-based Machine Learning Model for Adaptive Asset PricingLiao Zhu, Haoxuan Wu, Martin T. Wells
The paper proposes a new asset pricing model -- the News Embedding UMAP Selection (NEUS) model, to explain and predict the stock returns based on the financial news. Using a combination of various machine learning algorithms, we first derive a company embedding vector for each basis asset from the financial news. Then we obtain a collection of the basis assets based on their company embedding. After that for each stock, we select the basis assets to explain and predict the stock return with high-dimensional statistical methods. The new model is shown to have a significantly better fitting and prediction power than the Fama-French 5-factor model.