Fang Deng

LG
h-index18
16papers
233citations
Novelty59%
AI Score57

16 Papers

32.9ROJun 3
MAD: Mapping-Aware World Models for Agile Quadrotor Flight

Xinhong Zhang, Runqing Wang, Yunfan Ren et al.

Agile quadrotor flight in cluttered scenes requires more than a reactive mapping from a depth image to a control command: the vehicle must remember which regions have been observed, infer nearby occupied space, and act under partial visibility and tight latency. In this paper, we present Mapping-Aware Dreamer (MAD), a geometry-aware world model for vision-based quadrotor flight. Instead of using raw-image reconstruction as the main self-supervised objective, MAD learns recurrent latent dynamics that reconstruct robocentric occupancy and visibility grid maps together with proprioceptive states. This design forces the latent state to encode local geometry, visibility history, and ego-motion in a form that is directly relevant to collision avoidance. MAD is trained in DiffAero using a GPU-parallel map-construction module that provides high-throughput supervision for occupancy and visibility. The learned representation is used in three policy-learning modes: imagination-based MAD-Dreamer and feature-extractor variants based on PPO and SHAC. Across visual navigation and racing tasks, MAD-based agents achieve higher success rates, faster flight, and better cross-task transfer than corresponding vision-only baselines. The model also produces interpretable map predictions and accurate ego-motion estimates from depth observations. We further deploy the learned policy on a physical quadrotor with an Intel RealSense D435i and demonstrate safe indoor and outdoor flight under limited sensing, reaching 9.66 m/s in simulation and 5.05 m/s in real-world forest experiments. These results show that mapping-aware world models provide a practical middle ground between modular aerial navigation and end-to-end learning.

15.6LGMay 26
Adversarial Training for Robust Coverage Network under Worst-case Facility Losses

Changhao Miao, Yuntian Zhang, Tongyu Wu et al.

The Maximal Covering Location-Interdiction Problem (MCLIP) is a classic bi-level optimization problem, which is fundamental to resilient infrastructure planning yet remains computationally intractable. Specifically, the upper level determines facility locations to maximize coverage, while the lower level executes worst-case interdiction to minimize the coverage. The strong coupling between the upper and lower levels, combined with their respective high combinatorial complexity, renders traditional methods ineffective. To bridge this gap, we propose a Dual-Agent Deep Reinforcement Learning (DADRL) framework based on adversarial learning, comprising a location agent corresponding to the upper level and an interdiction agent corresponding to the lower level. Our contributions are threefold: (1) The location agent is trained simultaneously against an evolving interdiction agent, making it effectively capture the dynamic competitive interplay between the upper and lower levels; (2) To fully exploit the learned capabilities of the interdiction agent, we propose a Surrogate-based Ensemble Inference Strategy that utilizes the trained interdiction agent as a high-fidelity surrogate to guide the decisions of location agent; (3) Extensive experiments on synthetic and real-world datasets demonstrate that our approach achieves superior computational efficiency while maintaining highly competitive solution quality compared to other baselines. Furthermore, our DADRL framework is model-agnostic to network structures, while its underlying adversarial learning paradigm demonstrates strong potential for solving other bi-level optimization problems.

7.0LGMay 26
Towards Generalization-Oriented Models for Vehicle Routing Problems with Mixture-of-Experts

Changhao Miao, Yuntian Zhang, Tongyu Wu et al.

In recent years, Deep Reinforcement Learning (DRL) has achieved substantial progress on Vehicle Routing Problems (VRPs). However, existing DRL-based methods are typically trained on instances generated from a uniform distribution, which limits their performance under real-world distribution shifts. In this paper, we aim to develop a generalization-oriented model that partitions the policy network into multiple modules and adaptively recombines modules to form specific policies during inference. Specifically, we propose Residual Refined Experts with Instance-level Gating (R2E-IG) to improve cross-distribution generalization. Our contributions are threefold: (1) We introduce a Residual Refined Expert (R2E) architecture that enhance expert expressiveness via residual refinement; (2) We design an instance-level gating mechanism that learns distribution-aware instance representations and routes inputs to suitable modules; (3) We propose a mixed-distribution training mechanism equipped with Dynamic Weight Adaption (DWA), which dynamically reweights training data from different distributions to emphasize more informative ones. Extensive experiments show that R2E-IG achieves competitive performance against state-of-the-art baselines on both in-distribution and out-of-distribution instances across synthetic and benchmark datasets. Moreover, R2E-IG is generic and can be easily integrated into existing DRL-based methods to further improve performance.

12.0LGApr 21
Catching Every Ripple: Enhanced Anomaly Awareness via Dynamic Concept Adaptation

Jiaqi Zhu, Shaofeng Cai, Jie Chen et al.

Online anomaly detection (OAD) plays a pivotal role in real-time analytics and decision-making for evolving data streams. However, existing methods often rely on costly retraining and rigid decision boundaries, limiting their ability to adapt both effectively and efficiently to concept drift in dynamic environments. To address these challenges, we propose DyMETER, a dynamic concept adaptation framework for OAD that unifies on-the-fly parameter shifting and dynamic thresholding within a single online paradigm. DyMETER first learns a static detector on historical data to capture recurring central concepts, and then transitions to a dynamic mode to adapt to new concepts as drift occurs. Specifically, DyMETER employs a novel dynamic concept adaptation mechanism that leverages a hypernetwork to generate instance-aware parameter shifts for the static detector, thereby enabling efficient and effective adaptation without retraining or fine-tuning. To achieve robust and interpretable adaptation, DyMETER introduces a lightweight evolution controller to estimate instance-level concept uncertainty for adaptive updates. Further, DyMETER employs a dynamic threshold optimization module to adaptively recalibrates the decision boundary by maintaining a candidate window of uncertain samples, which ensures continuous alignment with evolving concepts. Extensive experiments demonstrate that DyMETER significantly outperforms existing OAD approaches across a wide spectrum of application scenarios.

11.5CVApr 2
VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing

Junyi Zong, Qingxuan Jia, Meixian Shi et al.

Quality inspection in smart manufacturing requires identifying intrinsic material and surface properties beyond visible geometry, yet vision-only methods remain vulnerable to occlusion and reflection. We propose VitaTouch, a property-aware vision-tactile-language model for material-property inference and natural-language attribute description. VitaTouch uses modality-specific encoders and a dual Q-Former to extract language-relevant visual and tactile features, which are compressed into prefix tokens for a large language model. We align each modality with text and explicitly couple vision and touch through contrastive learning. We also construct VitaSet, a multimodal dataset with 186 objects, 52k images, and 5.1k human-verified instruction-answer pairs. VitaTouch achieves the best performance on HCT and the overall TVL benchmark, while remaining competitive on SSVTP. On VitaSet, it reaches 88.89% hardness accuracy, 75.13% roughness accuracy, and 54.81% descriptor recall; the material-description task further achieves a peak semantic similarity of 0.9009. With LoRA-based fine-tuning, VitaTouch attains 100.0%, 96.0%, and 92.0% accuracy for 2-, 3-, and 5-category defect recognition, respectively, and delivers 94.0% closed-loop recognition accuracy and 94.0% end-to-end sorting success in 100 laboratory robotic trials. More details are available at the project page: https://vitatouch.github.io/

CVApr 15, 2024
Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection

Jiaqi Zhu, Shaofeng Cai, Fang Deng et al.

Large vision-language models (LVLMs) are markedly proficient in deriving visual representations guided by natural language. Recent explorations have utilized LVLMs to tackle zero-shot visual anomaly detection (VAD) challenges by pairing images with textual descriptions indicative of normal and abnormal conditions, referred to as anomaly prompts. However, existing approaches depend on static anomaly prompts that are prone to cross-semantic ambiguity, and prioritize global image-level representations over crucial local pixel-level image-to-text alignment that is necessary for accurate anomaly localization. In this paper, we present ALFA, a training-free approach designed to address these challenges via a unified model. We propose a run-time prompt adaptation strategy, which first generates informative anomaly prompts to leverage the capabilities of a large language model (LLM). This strategy is enhanced by a contextual scoring mechanism for per-image anomaly prompt adaptation and cross-semantic ambiguity mitigation. We further introduce a novel fine-grained aligner to fuse local pixel-level semantics for precise anomaly localization, by projecting the image-text alignment from global to local semantic spaces. Extensive evaluations on MVTec and VisA datasets confirm ALFA's effectiveness in harnessing the language potential for zero-shot VAD, achieving significant PRO improvements of 12.1% on MVTec and 8.9% on VisA compared to state-of-the-art approaches.

LGNov 4, 2025
An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems

Changhao Miao, Yuntian Zhang, Tongyu Wu et al.

The capacitated location-routing problems (CLRPs) are classical problems in combinatorial optimization, which require simultaneously making location and routing decisions. In CLRPs, the complex constraints and the intricate relationships between various decisions make the problem challenging to solve. With the emergence of deep reinforcement learning (DRL), it has been extensively applied to address the vehicle routing problem and its variants, while the research related to CLRPs still needs to be explored. In this paper, we propose the DRL with heterogeneous query (DRLHQ) to solve CLRP and open CLRP (OCLRP), respectively. We are the first to propose an end-to-end learning approach for CLRPs, following the encoder-decoder structure. In particular, we reformulate the CLRPs as a markov decision process tailored to various decisions, a general modeling framework that can be adapted to other DRL-based methods. To better handle the interdependency across location and routing decisions, we also introduce a novel heterogeneous querying attention mechanism designed to adapt dynamically to various decision-making stages. Experimental results on both synthetic and benchmark datasets demonstrate superior solution quality and better generalization performance of our proposed approach over representative traditional and DRL-based baselines in solving both CLRP and OCLRP.

LGDec 28, 2023
METER: A Dynamic Concept Adaptation Framework for Online Anomaly Detection

Jiaqi Zhu, Shaofeng Cai, Fang Deng et al.

Real-time analytics and decision-making require online anomaly detection (OAD) to handle drifts in data streams efficiently and effectively. Unfortunately, existing approaches are often constrained by their limited detection capacity and slow adaptation to evolving data streams, inhibiting their efficacy and efficiency in handling concept drift, which is a major challenge in evolving data streams. In this paper, we introduce METER, a novel dynamic concept adaptation framework that introduces a new paradigm for OAD. METER addresses concept drift by first training a base detection model on historical data to capture recurring central concepts, and then learning to dynamically adapt to new concepts in data streams upon detecting concept drift. Particularly, METER employs a novel dynamic concept adaptation technique that leverages a hypernetwork to dynamically generate the parameter shift of the base detection model, providing a more effective and efficient solution than conventional retraining or fine-tuning approaches. Further, METER incorporates a lightweight drift detection controller, underpinned by evidential deep learning, to support robust and interpretable concept drift detection. We conduct an extensive experimental evaluation, and the results show that METER significantly outperforms existing OAD approaches in various application scenarios.

51.2OCMar 9
Outlier-robust Autocovariance Least Square Estimation via Iteratively Reweighted Least Square

Jiahong Li, Fang Deng

The autocovariance least squares (ALS) method is a computationally efficient approach for estimating noise covariances in Kalman filters without requiring specific noise models. However, conventional ALS and its variants rely on the classic least mean squares (LMS) criterion, making them highly sensitive to measurement outliers and prone to severe performance degradation. To overcome this limitation, this paper proposes a novel outlier-robust ALS algorithm, termed ALS-IRLS, based on the iteratively reweighted least squares (IRLS) framework. Specifically, the proposed approach introduces a two-tier robustification strategy. First, an innovation-level adaptive thresholding mechanism is employed to filter out heavily contaminated data. Second, the outlier-contaminated autocovariance is formulated using an $ε$-contamination model, where the standard LMS criterion is replaced by the Huber cost function. The IRLS method is then utilized to iteratively adjust data weights based on estimation deviations, effectively mitigating the influence of residual outliers. Comparative simulations demonstrate that ALS-IRLS reduces the root-mean-square error (RMSE) of noise covariance estimates by over two orders of magnitude compared to standard ALS. Furthermore, it significantly enhances downstream state estimation accuracy, outperforming existing outlier-robust Kalman filters and achieving performance nearly equivalent to the ideal Oracle lower bound in the presence of noisy and anomalous data.

CVAug 6, 2025
Two-Way Garment Transfer: Unified Diffusion Framework for Dressing and Undressing Synthesis

Angang Zhang, Fang Deng, Hao Chen et al.

While recent advances in virtual try-on (VTON) have achieved realistic garment transfer to human subjects, its inverse task, virtual try-off (VTOFF), which aims to reconstruct canonical garment templates from dressed humans, remains critically underexplored and lacks systematic investigation. Existing works predominantly treat them as isolated tasks: VTON focuses on garment dressing while VTOFF addresses garment extraction, thereby neglecting their complementary symmetry. To bridge this fundamental gap, we propose the Two-Way Garment Transfer Model (TWGTM), to the best of our knowledge, the first unified framework for joint clothing-centric image synthesis that simultaneously resolves both mask-guided VTON and mask-free VTOFF through bidirectional feature disentanglement. Specifically, our framework employs dual-conditioned guidance from both latent and pixel spaces of reference images to seamlessly bridge the dual tasks. On the other hand, to resolve the inherent mask dependency asymmetry between mask-guided VTON and mask-free VTOFF, we devise a phased training paradigm that progressively bridges this modality gap. Extensive qualitative and quantitative experiments conducted across the DressCode and VITON-HD datasets validate the efficacy and competitive edge of our proposed approach.

DBMay 7, 2025
In-Context Adaptation to Concept Drift for Learned Database Operations

Jiaqi Zhu, Shaofeng Cai, Yanyan Shen et al.

Machine learning has demonstrated transformative potential for database operations, such as query optimization and in-database data analytics. However, dynamic database environments, characterized by frequent updates and evolving data distributions, introduce concept drift, which leads to performance degradation for learned models and limits their practical applicability. Addressing this challenge requires efficient frameworks capable of adapting to shifting concepts while minimizing the overhead of retraining or fine-tuning. In this paper, we propose FLAIR, an online adaptation framework that introduces a new paradigm called \textit{in-context adaptation} for learned database operations. FLAIR leverages the inherent property of data systems, i.e., immediate availability of execution results for predictions, to enable dynamic context construction. By formalizing adaptation as $f:(\mathbf{x} \,| \,C_t) \to \mathbf{y}$, with $C_t$ representing a dynamic context memory, FLAIR delivers predictions aligned with the current concept, eliminating the need for runtime parameter optimization. To achieve this, FLAIR integrates two key modules: a Task Featurization Module for encoding task-specific features into standardized representations, and a Dynamic Decision Engine, pre-trained via Bayesian meta-training, to adapt seamlessly using contextual information at runtime. Extensive experiments across key database tasks demonstrate that FLAIR outperforms state-of-the-art baselines, achieving up to 5.2x faster adaptation and reducing error by 22.5% for cardinality estimation.

SYMar 8, 2025
Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation

Kedi Xie, Martin Guay, Shimin Wang et al.

This paper studies the linear quadratic regulation (LQR) problem of unknown discrete-time systems via dynamic output feedback learning control. In contrast to the state feedback, the optimality of the dynamic output feedback control for solving the LQR problem requires an implicit condition on the convergence of the state observer. Moreover, due to unknown system matrices and the existence of observer error, it is difficult to analyze the convergence and stability of most existing output feedback learning-based control methods. To tackle these issues, we propose a generalized dynamic output feedback learning control approach with guaranteed convergence, stability, and optimality performance for solving the LQR problem of unknown discrete-time linear systems. In particular, a dynamic output feedback controller is designed to be equivalent to a state feedback controller. This equivalence relationship is an inherent property without requiring convergence of the estimated state by the state observer, which plays a key role in establishing the off-policy learning control approaches. By value iteration and policy iteration schemes, the adaptive dynamic programming based learning control approaches are developed to estimate the optimal feedback control gain. In addition, a model-free stability criterion is provided by finding a nonsingular parameterization matrix, which contributes to establishing a switched iteration scheme. Furthermore, the convergence, stability, and optimality analyses of the proposed output feedback learning control approaches are given. Finally, the theoretical results are validated by two numerical examples.

RODec 27, 2024
Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning

Xuan Zhou, Xiang Shi, Lele Zhang et al.

To improve the efficiency of warehousing system and meet huge customer orders, we aim to solve the challenges of dimension disaster and dynamic properties in hyper scale multi-robot task planning (MRTP) for robotic mobile fulfillment system (RMFS). Existing research indicates that hierarchical reinforcement learning (HRL) is an effective method to reduce these challenges. Based on that, we construct an efficient multi-stage HRL-based multi-robot task planner for hyper scale MRTP in RMFS, and the planning process is represented with a special temporal graph topology. To ensure optimality, the planner is designed with a centralized architecture, but it also brings the challenges of scaling up and generalization that require policies to maintain performance for various unlearned scales and maps. To tackle these difficulties, we first construct a hierarchical temporal attention network (HTAN) to ensure basic ability of handling inputs with unfixed lengths, and then design multi-stage curricula for hierarchical policy learning to further improve the scaling up and generalization ability while avoiding catastrophic forgetting. Additionally, we notice that policies with hierarchical structure suffer from unfair credit assignment that is similar to that in multi-agent reinforcement learning, inspired of which, we propose a hierarchical reinforcement learning algorithm with counterfactual rollout baseline to improve learning performance. Experimental results demonstrate that our planner outperform other state-of-the-art methods on various MRTP instances in both simulated and real-world RMFS. Also, our planner can successfully scale up to hyper scale MRTP instances in RMFS with up to 200 robots and 1000 retrieval racks on unlearned maps while keeping superior performance over other methods.

LGMay 9, 2023
Flexible Job Shop Scheduling via Dual Attention Network Based Reinforcement Learning

Runqing Wang, Gang Wang, Jian Sun et al.

Flexible manufacturing has given rise to complex scheduling problems such as the flexible job shop scheduling problem (FJSP). In FJSP, operations can be processed on multiple machines, leading to intricate relationships between operations and machines. Recent works have employed deep reinforcement learning (DRL) to learn priority dispatching rules (PDRs) for solving FJSP. However, the quality of solutions still has room for improvement relative to that by the exact methods such as OR-Tools. To address this issue, this paper presents a novel end-to-end learning framework that weds the merits of self-attention models for deep feature extraction and DRL for scalable decision-making. The complex relationships between operations and machines are represented precisely and concisely, for which a dual-attention network (DAN) comprising several interconnected operation message attention blocks and machine message attention blocks is proposed. The DAN exploits the complicated relationships to construct production-adaptive operation and machine features to support high-quality decisionmaking. Experimental results using synthetic data as well as public benchmarks corroborate that the proposed approach outperforms both traditional PDRs and the state-of-the-art DRL method. Moreover, it achieves results comparable to exact methods in certain cases and demonstrates favorable generalization ability to large-scale and real-world unseen FJSP tasks.

CLSep 27, 2019
End-to-End Code-Switching ASR for Low-Resourced Language Pairs

Xianghu Yue, Grandee Lee, Emre Yılmaz et al.

Despite the significant progress in end-to-end (E2E) automatic speech recognition (ASR), E2E ASR for low resourced code-switching (CS) speech has not been well studied. In this work, we describe an E2E ASR pipeline for the recognition of CS speech in which a low-resourced language is mixed with a high resourced language. Low-resourcedness in acoustic data hinders the performance of E2E ASR systems more severely than the conventional ASR systems.~To mitigate this problem in the transcription of archives with code-switching Frisian-Dutch speech, we integrate a designated decoding scheme and perform rescoring with neural network-based language models to enable better utilization of the available textual resources. We first incorporate a multi-graph decoding approach which creates parallel search spaces for each monolingual and mixed recognition tasks to maximize the utilization of the textual resources from each language. Further, language model rescoring is performed using a recurrent neural network pre-trained with cross-lingual embedding and further adapted with the limited amount of in-domain CS text. The ASR experiments demonstrate the effectiveness of the described techniques in improving the recognition performance of an E2E CS ASR system in a low-resourced scenario.

NAMar 22, 2015
Dimensions of Biquadratic and Bicubic Spline Spaces over Hierarchical T-meshes

Chao Zeng, Fang Deng, Xin Li et al.

This paper discusses the dimensions of biquadratic C1 spline spaces and bicubic C2 spline spaces over hierarchical T-meshes using the smoothing cofactor-conformality method. We obtain the dimension formula of biquadratic C1 spline spaces over hierarchical T-meshes in a concise way. In addition, we provide a dimension formula for bicubic C2 spline spaces over hierarchical T-mesh with fewer restrictions than that in the previous literature. A dimension formula for bicubic C2 spline spaces over a new type hierarchical T-mesh is also provided.