h-index40
11papers
733citations
Novelty55%
AI Score47

11 Papers

CVMar 18, 2024Code
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

Yang Yang, Wen Wang, Liang Peng et al.

Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts. Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image. However, we identify this straightforward method faces two major challenges: 1) concept confusion, where the model struggles to preserve distinct individual characteristics, and 2) concept vanishing, where the model fails to generate the intended subjects. To address these issues, we introduce LoRA-Composer, a training-free framework designed for seamlessly integrating multiple LoRAs, thereby enhancing the harmony among different concepts within generated images. LoRA-Composer addresses concept vanishing through concept injection constraints, enhancing concept visibility via an expanded cross-attention mechanism. To combat concept confusion, concept isolation constraints are introduced, refining the self-attention computation. Furthermore, latent re-initialization is proposed to effectively stimulate concept-specific latent within designated regions. Our extensive testing showcases a notable enhancement in LoRA-Composer's performance compared to standard baselines, especially when eliminating the image-based conditions like canny edge or pose estimations. Code is released at \url{https://github.com/Young98CN/LoRA_Composer}

LGFeb 11
RePO: Bridging On-Policy Learning and Off-Policy Knowledge through Rephrasing Policy Optimization

Linxuan Xia, Xiaolong Yang, Yongyuan Chen et al.

Aligning large language models (LLMs) on domain-specific data remains a fundamental challenge. Supervised fine-tuning (SFT) offers a straightforward way to inject domain knowledge but often degrades the model's generality. In contrast, on-policy reinforcement learning (RL) preserves generality but fails to effectively assimilate hard samples that exceed the model's current reasoning level. Recent off-policy RL attempts improve hard sample utilization, yet they suffer from severe training instability due to the forced distribution shift toward off-policy knowledge. To reconcile effective off-policy knowledge absorption with the stability of on-policy RL, we propose Rephrasing Policy Optimization (RePO). In RePO, the policy model is prompted to first comprehend off-policy knowledge and then rephrase it into trajectories that conform to its own stylistic and parametric distribution. RePO dynamically replaces low-reward rollouts with these rephrased, high-quality trajectories. This strategy guides the model toward correct reasoning paths while strictly preserving on-policy training dynamics. Experiments on several benchmarks demonstrate that RePO improves hard-sample utilization and outperforms existing baselines, achieving state-of-the-art performance.

CLMay 12, 2025Code
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models

Junjie Ye, Caishuang Huang, Zhuohan Chen et al.

Instruction following evaluates large language models (LLMs) on their ability to generate outputs that adhere to user-defined constraints. However, existing benchmarks often rely on templated constraint prompts, which lack the diversity of real-world usage and limit fine-grained performance assessment. To fill this gap, we propose a multi-dimensional constraint framework encompassing three constraint patterns, four constraint categories, and four difficulty levels. Building on this framework, we develop an automated instruction generation pipeline that performs constraint expansion, conflict detection, and instruction rewriting, yielding 1,200 code-verifiable instruction-following test samples. We evaluate 19 LLMs across seven model families and uncover substantial variation in performance across constraint forms. For instance, average performance drops from 77.67% at Level I to 32.96% at Level IV. Furthermore, we demonstrate the utility of our approach by using it to generate data for reinforcement learning, achieving substantial gains in instruction following without degrading general performance. In-depth analysis indicates that these gains stem primarily from modifications in the model's attention modules parameters, which enhance constraint recognition and adherence. Code and data are available in https://github.com/Junjie-Ye/MulDimIF.

CLOct 14, 2025
MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts

Yushu Zhao, Yubin Qin, Yang Wang et al.

Mixture-of-Experts (MoE) models have recently demonstrated exceptional performance across a diverse range of applications. The principle of sparse activation in MoE models facilitates an offloading strategy, wherein active experts are maintained in GPU HBM, while inactive experts are stored in CPU DRAM. The efficacy of this approach, however, is fundamentally constrained by the limited bandwidth of the CPU-GPU interconnect. To mitigate this bottleneck, existing approaches have employed prefetching to accelerate MoE inference. These methods attempt to predict and prefetch the required experts using specially trained modules. Nevertheless, such techniques are often encumbered by significant training overhead and have shown diminished effectiveness on recent MoE models with fine-grained expert segmentation. In this paper, we propose MoBiLE, a plug-and-play offloading-based MoE inference framework with \textit{mixture of big-little experts}. It reduces the number of experts for unimportant tokens to half for acceleration while maintaining full experts for important tokens to guarantee model quality. Further, a dedicated fallback and prefetching mechanism is designed for switching between little and big experts to improve memory efficiency. We evaluate MoBiLE on four typical modern MoE architectures and challenging generative tasks. Our results show that MoBiLE achieves a speedup of 1.60x to 1.72x compared to the baseline on a consumer GPU system, with negligible degradation in accuracy.

LGSep 16, 2021
Optimal Probing with Statistical Guarantees for Network Monitoring at Scale

Muhammad Jehangir Amjad, Christophe Diot, Dimitris Konomis et al.

Cloud networks are difficult to monitor because they grow rapidly and the budgets for monitoring them are limited. We propose a framework for estimating network metrics, such as latency and packet loss, with guarantees on estimation errors for a fixed monitoring budget. Our proposed algorithms produce a distribution of probes across network paths, which we then monitor; and are based on A- and E-optimal experimental designs in statistics. Unfortunately, these designs are too computationally costly to use at production scale. We propose their scalable and near-optimal approximations based on the Frank-Wolfe algorithm. We validate our approaches in simulation on real network topologies, and also using a production probing system in a real cloud network. We show major gains in reducing the probing budget compared to both production and academic baselines, while maintaining low estimation errors, even with very low probing budgets.

CVMar 15, 2021
LARNet: Lie Algebra Residual Network for Face Recognition

Xiaolong Yang, Xiaohong Jia, Dihong Gong et al.

Face recognition is an important yet challenging problem in computer vision. A major challenge in practical face recognition applications lies in significant variations between profile and frontal faces. Traditional techniques address this challenge either by synthesizing frontal faces or by pose invariant learning. In this paper, we propose a novel method with Lie algebra theory to explore how face rotation in the 3D space affects the deep feature generation process of convolutional neural networks (CNNs). We prove that face rotation in the image space is equivalent to an additive residual component in the feature space of CNNs, which is determined solely by the rotation. Based on this theoretical finding, we further design a Lie Algebraic Residual Network (LARNet) for tackling pose robust face recognition. Our LARNet consists of a residual subnet for decoding rotation information from input face images, and a gating subnet to learn rotation magnitude for controlling the strength of the residual component contributing to the feature learning process. Comprehensive experimental evaluations on both frontal-profile face datasets and general face recognition datasets convincingly demonstrate that our method consistently outperforms the state-of-the-art ones.

ROApr 1, 2020
Quasi-Direct Drive Actuation for a Lightweight Hip Exoskeleton with High Backdrivability and High Bandwidth

Shuangyue Yu, Tzu-Hao Huang, Xiaolong Yang et al.

High-performance actuators are crucial to enable mechanical versatility of lower-limb wearable robots, which are required to be lightweight, highly backdrivable, and with high bandwidth. State-of-the-art actuators, e.g., series elastic actuators (SEAs), have to compromise bandwidth to improve compliance (i.e., backdrivability). In this paper, we describe the design and human-robot interaction modeling of a portable hip exoskeleton based on our custom quasi-direct drive (QDD) actuation (i.e., a high torque density motor with low ratio gear). We also present a model-based performance benchmark comparison of representative actuators in terms of torque capability, control bandwidth, backdrivability, and force tracking accuracy. This paper aims to corroborate the underlying philosophy of "design for control", namely meticulous robot design can simplify control algorithms while ensuring high performance. Following this idea, we create a lightweight bilateral hip exoskeleton (overall mass is 3.4 kg) to reduce joint loadings during normal activities, including walking and squatting. Experimental results indicate that the exoskeleton is able to produce high nominal torque (17.5 Nm), high backdrivability (0.4 Nm backdrive torque), high bandwidth (62.4 Hz), and high control accuracy (1.09 Nm root mean square tracking error, i.e., 5.4% of the desired peak torque). Its controller is versatile to assist walking at different speeds (0.8-1.4 m/s) and squatting at 2 s cadence. This work demonstrates significant improvement in backdrivability and control bandwidth compared with state-of-the-art exoskeletons powered by the conventional actuation or SEA.

ROMar 10, 2020
A Fractional-Order Normalized Bouc-Wen Model for Piezoelectric Hysteresis Nonlinearity

Shengzheng Kang, Hongtao Wu, Yao Li et al.

This paper presents a new fractional-order normalized Bouc-Wen (BW) (FONBW) model to describe the asymmetric and rate-dependent hysteresis nonlinearity of piezoelectric actuators (PEAs). In view of the fact that the classical BW (CBW) model is only efficient for the symmetric and rate-independent hysteresis description, the FONBW model is devoted to characterizing the asymmetric and rate-dependent behaviors of the hysteresis in PEAs by adopting an Nth-order polynomial input function and two fractional operators, respectively. Different from the traditional modified BW models, the proposed FONBW model also eliminates the redundancy of parameters in the CBW model via the normalization processing. By this way, the developed FONBW model has a relatively simple mathematic expression with fewer parameters to simultaneously characterize the asymmetric and rate-dependent hysteresis behaviors of PEAs. Model parameters are identified by the self-adaptive differential evolution algorithm. To validate the effectiveness of the proposed model, a series of model verification and inverse-multiplicative-structure-based feedforward control experiments are carried out on a PEA system. Results show that the proposed model is superior to the CBW model and traditional modified BW model in modeling accuracy and hysteresis compensation.

ROJul 4, 2019
Spine-Inspired Continuum Soft Exoskeleton for Stoop Lifting Assistance

Xiaolong Yang, Tzu-Hao Huang, Hang Hu et al.

Back injuries are the most prevalent work-related musculoskeletal disorders and represent a major cause of disability. Although innovations in wearable robots aim to alleviate this hazard, the majority of existing exoskeletons are obtrusive because the rigid linkage design limits natural movement, thus causing ergonomic risk. Moreover, these existing systems are typically only suitable for one type of movement assistance, not ubiquitous for a wide variety of activities. To fill in this gap, this paper presents a new wearable robot design approach continuum soft exoskeleton. This spine-inspired wearable robot is unobtrusive and assists both squat and stoops while not impeding walking motion. To tackle the challenge of the unique anatomy of spine that is inappropriate to be simplified as a single degree of freedom joint, our robot is conformal to human anatomy and it can reduce multiple types of forces along the human spine such as the spinae muscle force, shear, and compression force of the lumbar vertebrae. We derived kinematics and kinetics models of this mechanism and established an analytical biomechanics model of human-robot interaction. Quantitative analysis of disc compression force, disc shear force and muscle force was performed in simulation. We further developed a virtual impedance control strategy to deliver force control and compensate hysteresis of Bowden cable transmission. The feasibility of the prototype was experimentally tested on three healthy subjects. The root mean square error of force tracking is 6.63 N (3.3 % of the 200N peak force) and it demonstrated that it can actively control the stiffness to the desired value. This continuum soft exoskeleton represents a feasible solution with the potential to reduce back pain for multiple activities and multiple forces along the human spine.

ROFeb 19, 2019
A Soft High Force Hand Exoskeleton for Rehabilitation and Assistance of Spinal Cord Injury and Stroke Individuals

Shuangyue Yu, Hadia Perez, James Barkas et al.

Individuals with spinal cord injury (SCI) and stroke who is lack of manipulation capability have a particular need for robotic hand exoskeletons. Among assistive and rehabilitative medical exoskeletons, there exists a sharp trade-off between device power on the one hand and ergonomics and portability on other, devices that provide stronger grasping assistance do so at the cost of patient comfort. This paper proposes using fin-ray inspired, cable-driven finger orthoses to generate high fingertip forces without the painful compressive and shear stresses commonly associated with conventional cable-drive exoskeletons. With combination cable-driven transmission and segmented-finger orthoses, the exoskeleton transmitted larger forces and applied torques discretely to the fingers, leading to strong fingertip forces. A prototype of the finger orthoses and associated cable transmission was fabricated, and force transmission tests of the prototype in the finger flexion mode demonstrated a 2:1 input-output ratio between cable tension and fingertip force, with a maximum fingertip force of 22 N. Moreover, the proposed design provides a comfortable experience for wearers thanks to its lightweight and conformal properties to the hands.

CVNov 15, 2017
Robust and Precise Vehicle Localization based on Multi-sensor Fusion in Diverse City Scenes

Guowei Wan, Xiaolong Yang, Renlan Cai et al.

We present a robust and precise localization system that achieves centimeter-level localization accuracy in disparate city scenes. Our system adaptively uses information from complementary sensors such as GNSS, LiDAR, and IMU to achieve high localization accuracy and resilience in challenging scenes, such as urban downtown, highways, and tunnels. Rather than relying only on LiDAR intensity or 3D geometry, we make innovative use of LiDAR intensity and altitude cues to significantly improve localization system accuracy and robustness. Our GNSS RTK module utilizes the help of the multi-sensor fusion framework and achieves a better ambiguity resolution success rate. An error-state Kalman filter is applied to fuse the localization measurements from different sources with novel uncertainty estimation. We validate, in detail, the effectiveness of our approaches, achieving 5-10cm RMS accuracy and outperforming previous state-of-the-art systems. Importantly, our system, while deployed in a large autonomous driving fleet, made our vehicles fully autonomous in crowded city streets despite road construction that occurred from time to time. A dataset including more than 60 km real traffic driving in various urban roads is used to comprehensively test our system.