Xuan Wu

LG
h-index61
29papers
387citations
Novelty54%
AI Score59

29 Papers

CVMay 30
Cohort-Scale Neural Atlases of Ultrasound Video

Zhuorui Zhang, Roger Pallarès-López, Xuan Wu et al.

Ultrasound is the most widely used real-time imaging modality in clinical practice, yet per-frame video annotation remains a major bottleneck: expert labels are scarce and costly, and image appearance varies with speckle, shadowing, attenuation, and operator-dependent probe pose. This is especially limiting because clinically relevant information is often dynamic, from left-ventricular motion in echocardiography to muscle and bone kinematics in musculoskeletal imaging. Population atlases can amortize annotation cost by registering observations to a shared canonical coordinate system, but existing neural atlas methods mainly target single videos, small test-time image sets, or object-centric image collections. We introduce a cohort-scale neural atlas for ultrasound video: a single canonical chart with per-video Generative Latent Optimization embeddings, trained jointly over thousands of frames in DINOv3 feature space. Across five cardiac and musculoskeletal datasets with point landmarks and segmentation masks, our method learns coherent canonical templates and enables accurate atlas-space annotation transfer. On EchoNet-Dynamic and MSK-Bone, it supports single- and few-shot transfer with accuracy competitive with strong dense-correspondence baselines, while training in minutes on a single consumer GPU. The learned embeddings are interpretable: linear projections reveal structured cohort variation, image-decoder interpolation produces anatomically plausible intermediate frames, and test-time latent inversion reconstructs held-out frames through the atlas. These results suggest that cohort-scale neural atlases offer a practical, interpretable representation for reducing expert annotation burden in ultrasound video analysis.

CLFeb 3
Accelerating Scientific Research with Gemini: Case Studies and Common Techniques

David P. Woodruff, Vincent Cohen-Addad, Lalit Jain et al.

Recent advances in large language models (LLMs) have opened new avenues for accelerating scientific research. While models are increasingly capable of assisting with routine tasks, their ability to contribute to novel, expert-level mathematical discovery is less understood. We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models, specifically Google's Gemini-based models (in particular Gemini Deep Think and its advanced variants), to solve open problems, refute conjectures, and generate new proofs across diverse areas in theoretical computer science, as well as other areas such as economics, optimization, and physics. Based on these experiences, we extract common techniques for effective human-AI collaboration in theoretical research, such as iterative refinement, problem decomposition, and cross-disciplinary knowledge transfer. While the majority of our results stem from this interactive, conversational methodology, we also highlight specific instances that push beyond standard chat interfaces. These include deploying the model as a rigorous adversarial reviewer to detect subtle flaws in existing proofs, and embedding it within a "neuro-symbolic" loop that autonomously writes and executes code to verify complex derivations. Together, these examples highlight the potential of AI not just as a tool for automation, but as a versatile, genuine partner in the creative process of scientific discovery.

AIMay 28
NICE: A Theory-Grounded Diagnostic Benchmark for Social Intelligence of LLMs

Yunjin Qi, Zhaojun Jiang, Xuan Wu et al.

As large language models (LLMs) are increasingly applied in social contexts such as emotional companionship and customer service, measuring their social intelligence has become critical to the quality and safety of human-AI interaction. However, existing social intelligence benchmarks lack a unified framework that organizes social abilities into a unified structure, and therefore cannot enable fine-grained diagnosis. To build the first holistic diagnostic evaluation grounded in social theory, we first construct a social intelligence framework through a literature review and multi-stage expert validation guided by psychometric principles. The resulting framework includes 4 categories and 11 dimensions, each further specified by fine-grained capability facets. Building on this framework, we introduce NICE (Norm, Interaction, Cognition, Experience), a diagnostic benchmark of 137 items operationalized through representative Chinese contexts. Across 5 frontier LLMs and a human reference group, models score higher in aggregate accuracy yet show a consistent weakness in Communication, which the framework localizes to 3 specific capability facets: multi-turn communication, nonverbal communication, and synchrony. NICE thus reframes social intelligence evaluation toward theory-grounded diagnosis of socially consequential weaknesses in LLMs.

CVDec 28, 2022Code
Shape-Aware Fine-Grained Classification of Erythroid Cells

Ye Wang, Rui Ma, Xiaoqing Ma et al.

Fine-grained classification and counting of bone marrow erythroid cells are vital for evaluating the health status and formulating therapeutic schedules for leukemia or hematopathy. Due to the subtle visual differences between different types of erythroid cells, it is challenging to apply existing image-based deep learning models for fine-grained erythroid cell classification. Moreover, there is no large open-source datasets on erythroid cells to support the model training. In this paper, we introduce BMEC (Bone Morrow Erythroid Cells), the first large fine-grained image dataset of erythroid cells, to facilitate more deep learning research on erythroid cells. BMEC contains 5,666 images of individual erythroid cells, each of which is extracted from the bone marrow erythroid cell smears and professionally annotated to one of the four types of erythroid cells. To distinguish the erythroid cells, one key indicator is the cell shape which is closely related to the cell growth and maturation. Therefore, we design a novel shape-aware image classification network for fine-grained erythroid cell classification. The shape feature is extracted from the shape mask image and aggregated to the raw image feature with a shape attention module. With the shape-attended image feature, our network achieved superior classification performance (81.12\% top-1 accuracy) on the BMEC dataset comparing to the baseline methods. Ablation studies also demonstrate the effectiveness of incorporating the shape information for the fine-grained cell classification. To further verify the generalizability of our method, we tested our network on two additional public white blood cells (WBC) datasets and the results show our shape-aware method can generally outperform recent state-of-the-art works on classifying the WBC. The code and BMEC dataset can be found on https://github.com/wangye8899/BMEC.

LGMar 8, 2022
New Coresets for Projective Clustering and Applications

Murad Tukan, Xuan Wu, Samson Zhou et al.

$(j,k)$-projective clustering is the natural generalization of the family of $k$-clustering and $j$-subspace clustering problems. Given a set of points $P$ in $\mathbb{R}^d$, the goal is to find $k$ flats of dimension $j$, i.e., affine subspaces, that best fit $P$ under a given distance measure. In this paper, we propose the first algorithm that returns an $L_\infty$ coreset of size polynomial in $d$. Moreover, we give the first strong coreset construction for general $M$-estimator regression. Specifically, we show that our construction provides efficient coreset constructions for Cauchy, Welsch, Huber, Geman-McClure, Tukey, $L_1-L_2$, and Fair regression, as well as general concave and power-bounded loss functions. Finally, we provide experimental results based on real-world datasets, showing the efficacy of our approach.

CVJun 7, 2023
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

Haiyang Xu, Qinghao Ye, Xuan Wu et al.

To promote the development of Vision-Language Pre-training (VLP) and multimodal Large Language Model (LLM) in the Chinese community, we firstly release the largest public Chinese high-quality video-language dataset named Youku-mPLUG, which is collected from Youku, a well-known Chinese video-sharing website, with strict criteria of safety, diversity, and quality. Youku-mPLUG contains 10 million Chinese video-text pairs filtered from 400 million raw videos across a wide range of 45 diverse categories for large-scale pre-training. In addition, to facilitate a comprehensive evaluation of video-language models, we carefully build the largest human-annotated Chinese benchmarks covering three popular video-language tasks of cross-modal retrieval, video captioning, and video category classification. Youku-mPLUG can enable researchers to conduct more in-depth multimodal research and develop better applications in the future. Furthermore, we release popular video-language pre-training models, ALPRO and mPLUG-2, and our proposed modularized decoder-only model mPLUG-video pre-trained on Youku-mPLUG. Experiments show that models pre-trained on Youku-mPLUG gain up to 23.1% improvement in video category classification. Besides, mPLUG-video achieves a new state-of-the-art result on these benchmarks with 80.5% top-1 accuracy in video category classification and 68.9 CIDEr score in video captioning, respectively. Finally, we scale up mPLUG-video based on the frozen Bloomz with only 1.7% trainable parameters as Chinese multimodal LLM, and demonstrate impressive instruction and video understanding ability. The zero-shot instruction understanding experiment indicates that pretraining with Youku-mPLUG can enhance the ability to comprehend overall and detailed visual semantics, recognize scene text, and leverage open-domain knowledge.

AIAug 1, 2023
Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems

Yubin Xiao, Di Wang, Boyang Li et al.

The Traveling Salesman Problem (TSP) is a well-known combinatorial optimization problem with broad real-world applications. Recently, neural networks have gained popularity in this research area because as shown in the literature, they provide strong heuristic solutions to TSPs. Compared to autoregressive neural approaches, non-autoregressive (NAR) networks exploit the inference parallelism to elevate inference speed but suffer from comparatively low solution quality. In this paper, we propose a novel NAR model named NAR4TSP, which incorporates a specially designed architecture and an enhanced reinforcement learning strategy. To the best of our knowledge, NAR4TSP is the first TSP solver that successfully combines RL and NAR networks. The key lies in the incorporation of NAR network output decoding into the training process. NAR4TSP efficiently represents TSP encoded information as rewards and seamlessly integrates it into reinforcement learning strategies, while maintaining consistent TSP sequence constraints during both training and testing phases. Experimental results on both synthetic and real-world TSPs demonstrate that NAR4TSP outperforms five state-of-the-art models in terms of solution quality, inference speed, and generalization to unseen scenarios.

CVApr 1Code
Camouflage-aware Image-Text Retrieval via Expert Collaboration

Yao Jiang, Zhongkuan Mao, Xuan Wu et al.

Camouflaged scene understanding (CSU) has attracted significant attention due to its broad practical implications. However, in this field, robust image-text cross-modal alignment remains under-explored, hindering deeper understanding of camouflaged scenarios and their related applications. To this end, we focus on the typical image-text retrieval task, and formulate a new task dubbed ``camouflage-aware image-text retrieval'' (CA-ITR). We first construct a dedicated camouflage image-text retrieval dataset (CamoIT), comprising $\sim$10.5K samples with multi-granularity textual annotations. Benchmark results conducted on CamoIT reveal the underlying challenges of CA-ITR for existing cutting-edge retrieval techniques, which are mainly caused by objects' camouflage properties as well as those complex image contents. As a solution, we propose a camouflage-expert collaborative network (CECNet), which features a dual-branch visual encoder: one branch captures holistic image representations, while the other incorporates a dedicated model to inject representations of camouflaged objects. A novel confidence-conditioned graph attention (C\textsuperscript{2}GA) mechanism is incorporated to exploit the complementarity across branches. Comparative experiments show that CECNet achieves $\sim$29% overall CA-ITR accuracy boost, surpassing seven representative retrieval models. The dataset and code will be available at https://github.com/jiangyao-scu/CA-ITR.

DBMar 26
Enabling Homomorphic Analytical Operations on Compressed Scientific Data with Multi-stage Decompression

Xuan Wu, Sheng Di, Tripti Agarwal et al.

Error-controlled lossy compressors have been widely used in scientific applications to reduce the unprecedented size of scientific data while keeping data distortion within a user-specified threshold. While they significantly mitigate the pressure for data storage and transmission, they prolong the time to access the data because decompression is required to transform the binary compressed data into meaningful floating-point numbers. This incurs noticeable overhead for common analytical operations on scientific data that extract or derive useful information, because the time cost of the operations could be much lower than that of decompression. In this work, we design an error-controlled lossy compression and analytical framework that features multi-stage decompression and homomorphic analytical operation algorithms on intermediate decompressed data for reduced data access latency. Our contributions are threefold. (1) We abstract a generic compression pipeline with partial decompression to multiple intermediate data representations and implement four instances based on state-of-the-art high-throughput scientific data compressors. (2) We carefully design homomorphic algorithms to enable direct operations on intermediate decompressed data for three types of analytical operations on scientific data. (3) We evaluate our approach using five real-world scientific datasets. Experimental evaluations demonstrate that our method achieves significant speedups when performing analytical operations on compressed scientific data across all three targeted analytical operation types.

LGMay 20
WeCon: An Efficient Weight-Conditioned Neural Solver for Multi-Objective Combinatorial Optimization Problems

Xuan Wu, Jinbiao Chen, Yang Li et al.

Existing neural solvers for Multi-Objective Combinatorial Optimization Problems (MOCOPs) commonly adopt decomposition-based strategies that scalarize an MOCOP into multiple subproblems associated with distinct weight vectors. However, they either inject weights only once during decoding, limiting weight-conditioned context modeling, or primarily during encoding, causing weight-signal dilution during decoding. Moreover, preference optimization methods rely on purely random sampling to construct solution pairs for training solvers, which often produces less informative pairs and thus leads to low training effectiveness. To better address these limitations, we propose an efficient Weight-Conditioned neural solver (WeCon). Specifically, we design an encoder layer with three attention blocks and our proposed Gated Residual Fusion (GRF) block to facilitate harmonious interaction between instance features and weights, thereby generating informative weight-conditioned context. We further introduce a plug-and-play Residual Fusion (RF) block in the decoder to alleviate weight-signal dilution. Finally, we propose Efficient Preference Optimization (EPO), which constructs high-quality solutions, thereby generating more informative pairs to improve training effectiveness. Experiments on four MOCOP variants across different problem scales and distribution patterns demonstrate that WeCon achieves HyperVolume (HV) values comparable to SOTA solver POCCO-W, while reducing inference time by 40%. Ablation studies validate the contributions of all designs.

CVNov 23, 2024Code
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation

Datao Tang, Xiangyong Cao, Xuan Wu et al.

Remote sensing image object detection (RSIOD) aims to identify and locate specific objects within satellite or aerial imagery. However, there is a scarcity of labeled data in current RSIOD datasets, which significantly limits the performance of current detection algorithms. Although existing techniques, e.g., data augmentation and semi-supervised learning, can mitigate this scarcity issue to some extent, they are heavily dependent on high-quality labeled data and perform worse in rare object classes. To address this issue, this paper proposes a layout-controllable diffusion generative model (i.e. AeroGen) tailored for RSIOD. To our knowledge, AeroGen is the first model to simultaneously support horizontal and rotated bounding box condition generation, thus enabling the generation of high-quality synthetic images that meet specific layout and object category requirements. Additionally, we propose an end-to-end data augmentation framework that integrates a diversity-conditioned generator and a filtering mechanism to enhance both the diversity and quality of generated data. Experimental results demonstrate that the synthetic data produced by our method are of high quality and diversity. Furthermore, the synthetic RSIOD data can significantly improve the detection performance of existing RSIOD models, i.e., the mAP metrics on DIOR, DIOR-R, and HRSC datasets are improved by 3.7%, 4.3%, and 2.43%, respectively. The code is available at https://github.com/Sonettoo/AeroGen.

LGJan 23, 2025Code
An Efficient Diffusion-based Non-Autoregressive Solver for Traveling Salesman Problem

Mingzhao Wang, You Zhou, Zhiguang Cao et al.

Recent advances in neural models have shown considerable promise in solving Traveling Salesman Problems (TSPs) without relying on much hand-crafted engineering. However, while non-autoregressive (NAR) approaches benefit from faster inference through parallelism, they typically deliver solutions of inferior quality compared to autoregressive ones. To enhance the solution quality while maintaining fast inference, we propose DEITSP, a diffusion model with efficient iterations tailored for TSP that operates in a NAR manner. Firstly, we introduce a one-step diffusion model that integrates the controlled discrete noise addition process with self-consistency enhancement, enabling optimal solution prediction through simultaneous denoising of multiple solutions. Secondly, we design a dual-modality graph transformer to bolster the extraction and fusion of features from node and edge modalities, while further accelerating the inference with fewer layers. Thirdly, we develop an efficient iterative strategy that alternates between adding and removing noise to improve exploration compared to previous diffusion methods. Additionally, we devise a scheduling framework to progressively refine the solution space by adjusting noise levels, facilitating a smooth search for optimal solutions. Extensive experiments on real-world and large-scale TSP instances demonstrate that DEITSP performs favorably against existing neural approaches in terms of solution quality, inference latency, and generalization ability. Our code is available at $\href{https://github.com/DEITSP/DEITSP}{https://github.com/DEITSP/DEITSP}$.

AINov 22, 2022
Differentiable Fuzzy $\mathcal{ALC}$: A Neural-Symbolic Representation Language for Symbol Grounding

Xuan Wu, Xinhao Zhu, Yizheng Zhao et al.

Neural-symbolic computing aims at integrating robust neural learning and sound symbolic reasoning into a single framework, so as to leverage the complementary strengths of both of these, seemingly unrelated (maybe even contradictory) AI paradigms. The central challenge in neural-symbolic computing is to unify the formulation of neural learning and symbolic reasoning into a single framework with common semantics, that is, to seek a joint representation between a neural model and a logical theory that can support the basic grounding learned by the neural model and also stick to the semantics of the logical theory. In this paper, we propose differentiable fuzzy $\mathcal{ALC}$ (DF-$\mathcal{ALC}$) for this role, as a neural-symbolic representation language with the desired semantics. DF-$\mathcal{ALC}$ unifies the description logic $\mathcal{ALC}$ and neural models for symbol grounding; in particular, it infuses an $\mathcal{ALC}$ knowledge base into neural models through differentiable concept and role embeddings. We define a hierarchical loss to the constraint that the grounding learned by neural models must be semantically consistent with $\mathcal{ALC}$ knowledge bases. And we find that capturing the semantics in grounding solely by maximizing satisfiability cannot revise grounding rationally. We further define a rule-based loss for DF adapting to symbol grounding problems. The experiment results show that DF-$\mathcal{ALC}$ with rule-based loss can improve the performance of image object detectors in an unsupervised learning way, even in low-resource situations.

LGAug 18, 2023
An Efficient 1 Iteration Learning Algorithm for Gaussian Mixture Model And Gaussian Mixture Embedding For Neural Network

Weiguo Lu, Xuan Wu, Deng Ding et al.

We propose an Gaussian Mixture Model (GMM) learning algorithm, based on our previous work of GMM expansion idea. The new algorithm brings more robustness and simplicity than classic Expectation Maximization (EM) algorithm. It also improves the accuracy and only take 1 iteration for learning. We theoretically proof that this new algorithm is guarantee to converge regardless the parameters initialisation. We compare our GMM expansion method with classic probability layers in neural network leads to demonstrably better capability to overcome data uncertainty and inverse problem. Finally, we test GMM based generator which shows a potential to build further application that able to utilized distribution random sampling for stochastic variation as well as variation control.

CVMar 4, 2024Code
Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection

Xin Zhang, Tao Xiao, Gepeng Ji et al.

Camouflage poses challenges in distinguishing a static target, whereas any movement of the target can break this disguise. Existing video camouflaged object detection (VCOD) approaches take noisy motion estimation as input or model motion implicitly, restricting detection performance in complex dynamic scenes. In this paper, we propose a novel Explicit Motion handling and Interactive Prompting framework for VCOD, dubbed EMIP, which handles motion cues explicitly using a frozen pre-trained optical flow fundamental model. EMIP is characterized by a two-stream architecture for simultaneously conducting camouflaged segmentation and optical flow estimation. Interactions across the dual streams are realized in an interactive prompting way that is inspired by emerging visual prompt learning. Two learnable modules, i.e., the camouflaged feeder and motion collector, are designed to incorporate segmentation-to-motion and motion-to-segmentation prompts, respectively, and enhance outputs of the both streams. The prompt fed to the motion stream is learned by supervising optical flow in a self-supervised manner. Furthermore, we show that long-term historical information can also be incorporated as a prompt into EMIP and achieve more robust results with temporal consistency. Experimental results demonstrate that our EMIP achieves new state-of-the-art records on popular VCOD benchmarks. Our code is made publicly available at https://github.com/zhangxin06/EMIP.

LGDec 19, 2023
Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed

Yubin Xiao, Di Wang, Boyang Li et al.

Neural construction models have shown promising performance for Vehicle Routing Problems (VRPs) by adopting either the Autoregressive (AR) or Non-Autoregressive (NAR) learning approach. While AR models produce high-quality solutions, they generally have a high inference latency due to their sequential generation nature. Conversely, NAR models generate solutions in parallel with a low inference latency but generally exhibit inferior performance. In this paper, we propose a generic Guided Non-Autoregressive Knowledge Distillation (GNARKD) method to obtain high-performance NAR models having a low inference latency. GNARKD removes the constraint of sequential generation in AR models while preserving the learned pivotal components in the network architecture to obtain the corresponding NAR models through knowledge distillation. We evaluate GNARKD by applying it to three widely adopted AR models to obtain NAR VRP solvers for both synthesized and real-world instances. The experimental results demonstrate that GNARKD significantly reduces the inference time (4-5 times faster) with acceptable performance drop (2-3\%). To the best of our knowledge, this study is first-of-its-kind to obtain NAR VRP solvers from AR ones through knowledge distillation.

LGJun 2, 2025
Towards Efficient Few-shot Graph Neural Architecture Search via Partitioning Gradient Contribution

Wenhao Song, Xuan Wu, Bo Yang et al.

To address the weight coupling problem, certain studies introduced few-shot Neural Architecture Search (NAS) methods, which partition the supernet into multiple sub-supernets. However, these methods often suffer from computational inefficiency and tend to provide suboptimal partitioning schemes. To address this problem more effectively, we analyze the weight coupling problem from a novel perspective, which primarily stems from distinct modules in succeeding layers imposing conflicting gradient directions on the preceding layer modules. Based on this perspective, we propose the Gradient Contribution (GC) method that efficiently computes the cosine similarity of gradient directions among modules by decomposing the Vector-Jacobian Product during supernet backpropagation. Subsequently, the modules with conflicting gradient directions are allocated to distinct sub-supernets while similar ones are grouped together. To assess the advantages of GC and address the limitations of existing Graph Neural Architecture Search methods, which are limited to searching a single type of Graph Neural Networks (Message Passing Neural Networks (MPNNs) or Graph Transformers (GTs)), we propose the Unified Graph Neural Architecture Search (UGAS) framework, which explores optimal combinations of MPNNs and GTs. The experimental results demonstrate that GC achieves state-of-the-art (SOTA) performance in supernet partitioning quality and time efficiency. In addition, the architectures searched by UGAS+GC outperform both the manually designed GNNs and those obtained by existing NAS methods. Finally, ablation studies further demonstrate the effectiveness of all proposed methods.

CVOct 9, 2025
TCIP: Threshold-Controlled Iterative Pyramid Network for Deformable Medical Image Registration

Heming Wu, Di Wang, Tai Ma et al.

Although pyramid networks have demonstrated superior performance in deformable medical image registration, their decoder architectures are inherently prone to propagating and accumulating anatomical structure misalignments. Moreover, most existing models do not adaptively determine the number of iterations for optimization under varying deformation requirements across images, resulting in either premature termination or excessive iterations that degrades registration accuracy. To effectively mitigate the accumulation of anatomical misalignments, we propose the Feature-Enhanced Residual Module (FERM) as the core component of each decoding layer in the pyramid network. FERM comprises three sequential blocks that extract anatomical semantic features, learn to suppress irrelevant features, and estimate the final deformation field, respectively. To adaptively determine the number of iterations for varying images, we propose the dual-stage Threshold-Controlled Iterative (TCI) strategy. In the first stage, TCI assesses registration stability and with asserted stability, it continues with the second stage to evaluate convergence. We coin the model that integrates FERM and TCI as Threshold-Controlled Iterative Pyramid (TCIP). Extensive experiments on three public brain MRI datasets and one abdomen CT dataset demonstrate that TCIP outperforms the state-of-the-art (SOTA) registration networks in terms of accuracy, while maintaining comparable inference speed and a compact model parameter size. Finally, we assess the generalizability of FERM and TCI by integrating them with existing registration networks and further conduct ablation studies to validate the effectiveness of these two proposed methods.

LGJul 28, 2025
Efficient Neural Combinatorial Optimization Solver for the Min-max Heterogeneous Capacitated Vehicle Routing Problem

Xuan Wu, Di Wang, Chunguo Wu et al.

Numerous Neural Combinatorial Optimization (NCO) solvers have been proposed to address Vehicle Routing Problems (VRPs). However, most of these solvers focus exclusively on single-vehicle VRP variants, overlooking the more realistic min-max Heterogeneous Capacitated Vehicle Routing Problem (MMHCVRP), which involves multiple vehicles. Existing MMHCVRP solvers typically select a vehicle and its next node to visit at each decoding step, but often make myopic decoding decisions and overlook key properties of MMHCVRP, including local topological relationships, vehicle permutation invariance, and node symmetry, resulting in suboptimal performance. To better address these limitations, we propose ECHO, an efficient NCO solver. First, ECHO exploits the proposed dual-modality node encoder to capture local topological relationships among nodes. Subsequently, to mitigate myopic decisions, ECHO employs the proposed Parameter-Free Cross-Attention mechanism to prioritize the vehicle selected in the preceding decoding step. Finally, leveraging vehicle permutation invariance and node symmetry, we introduce a tailored data augment strategy for MMHCVRP to stabilize the Reinforcement Learning training process. To assess the performance of ECHO, we conduct extensive experiments. The experimental results demonstrate that ECHO outperforms state-of-the-art NCO solvers across varying numbers of vehicles and nodes, and exhibits well-performing generalization across both scales and distribution patterns. Finally, ablation studies validate the effectiveness of all proposed methods.

LGJun 24, 2025
FlightKooba: A Fast Interpretable FTP Model

Jing Lu, Xuan Wu, Yizhun Tian et al.

Flight trajectory prediction (FTP) and similar time series tasks typically require capturing smooth latent dynamics hidden within noisy signals. However, existing deep learning models face significant challenges of high computational cost and insufficient interpretability due to their complex black-box nature. This paper introduces FlightKooba, a novel modeling approach designed to extract such underlying dynamics analytically. Our framework uniquely integrates HiPPO theory, Koopman operator theory, and control theory. By leveraging Legendre polynomial bases, it constructs Koopman operators analytically, thereby avoiding large-scale parameter training. The method's core strengths lie in its exceptional computational efficiency and inherent interpretability. Experiments on multiple public datasets validate our design philosophy: for signals exhibiting strong periodicity or clear physical laws (e.g., in aviation, meteorology, and traffic flow), FlightKooba delivers competitive prediction accuracy while reducing trainable parameters by several orders of magnitude and achieving the fastest training speed. Furthermore, we analyze the model's theoretical boundaries, clarifying its inherent low-pass filtering characteristics that render it unsuitable for sequences dominated by high-frequency noise. In summary, FlightKooba offers a powerful, efficient, and interpretable new alternative for time series analysis, particularly in resource-constrained environments.

LGJun 10, 2024
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

Yubin Xiao, Di Wang, Xuan Wu et al.

Neural models produce promising results when solving Vehicle Routing Problems (VRPs), but often fall short in generalization. Recent attempts to enhance model generalization often incur unnecessarily large training cost or cannot be directly applied to other models solving different VRP variants. To address these issues, we take a novel perspective on model architecture in this study. Specifically, we propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder to enhance the size and distribution generalization, respectively. ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes. The DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space to encompass a broader range of distributional scenarios. We conduct extensive experiments on both synthetic and widely recognized real-world benchmarking datasets and compare the performance with seven baseline models. The results demonstrate the effectiveness of using ESF and DS decoder to obtain a more generalizable model and showcase their applicability to solve different VRP variants, i.e., travelling salesman problem and capacitated VRP. Notably, our proposed generic components require minimal computational resources, and can be effortlessly integrated into conventional generalization strategies to further elevate model generalization.

AIJun 1, 2024
Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives

Xuan Wu, Di Wang, Lijie Wen et al.

Although several surveys on Neural Combinatorial Optimization (NCO) solvers specifically designed to solve Vehicle Routing Problems (VRPs) have been conducted, they did not cover the state-of-the-art (SOTA) NCO solvers emerged recently. More importantly, to establish a comprehensive and up-to-date taxonomy of NCO solvers, we systematically review relevant publications and preprints, categorizing them into four distinct types, namely Learning to Construct, Learning to Improve, Learning to Predict-Once, and Learning to Predict-Multiplicity solvers. Subsequently, we present the inadequacies of the SOTA solvers, including poor generalization, incapability to solve large-scale VRPs, inability to address most types of VRP variants simultaneously, and difficulty in comparing these NCO solvers with the conventional Operations Research algorithms. Simultaneously, we discuss on-going efforts, identify open inadequacies, as well as propose promising and viable directions to overcome these inadequacies. Notably, existing efforts focus on only one or two of these inadequacies, with none attempting to address all of them concurrently. In addition, we compare the performance of representative NCO solvers from the Reinforcement, Supervised, and Unsupervised Learning paradigms across VRPs of varying scales. Finally, following the proposed taxonomy, we provide an accompanying web page as a live repository for NCO solvers. Through this survey and the live repository, we aim to foster further advancements in the NCO community.

LGJan 20, 2024
Diffusion Model Conditioning on Gaussian Mixture Model and Negative Gaussian Mixture Gradient

Weiguo Lu, Xuan Wu, Deng Ding et al.

Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feature conditioning to guide the denoising process. Based on set theory, we provide a comprehensive theoretical analysis that shows that conditional latent distribution based on features and classes is significantly different, so that conditional latent distribution on features produces fewer defect generations than conditioning on classes. Two diffusion models conditioned on the Gaussian mixture model are trained separately for comparison. Experiments support our findings. A novel gradient function called the negative Gaussian mixture gradient (NGMG) is proposed and applied in diffusion model training with an additional classifier. Training stability has improved. We also theoretically prove that NGMG shares the same benefit as the Earth Mover distance (Wasserstein) as a more sensible cost function when learning distributions supported by low-dimensional manifolds.

NEAug 25, 2021
Incorporating Surprisingly Popular Algorithm and Euclidean Distance-based Adaptive Topology into PSO

Xuan Wu, Jizong Han, Di Wang et al.

While many Particle Swarm Optimization (PSO) algorithms only use fitness to assess the performance of particles, in this work, we adopt Surprisingly Popular Algorithm (SPA) as a complementary metric in addition to fitness. Consequently, particles that are not widely known also have the opportunity to be selected as the learning exemplars. In addition, we propose a Euclidean distance-based adaptive topology to cooperate with SPA, where each particle only connects to k number of particles with the shortest Euclidean distance during each iteration. We also introduce the adaptive topology into heterogeneous populations to better solve large-scale problems. Specifically, the exploration sub-population better preserves the diversity of the population while the exploitation sub-population achieves fast convergence. Therefore, large-scale problems can be solved in a collaborative manner to elevate the overall performance. To evaluate the performance of our method, we conduct extensive experiments on various optimization problems, including three benchmark suites and two real-world optimization problems. The results demonstrate that our Euclidean distance-based adaptive topology outperforms the other widely adopted topologies and further suggest that our method performs significantly better than state-of-the-art PSO variants on small, medium, and large-scale problems.

NEMar 12, 2021
Neural Architecture Search based on Cartesian Genetic Programming Coding Method

Xuan Wu, Linhan Jia, Xiuyi Zhang et al.

Neural architecture search (NAS) is a hot topic in the field of automated machine learning and outperforms humans in designing neural architectures on quite a few machine learning tasks. Motivated by the natural representation form of neural networks by the Cartesian genetic programming (CGP), we propose an evolutionary approach of NAS based on CGP, called CGPNAS, to solve sentence classification task. To evolve the architectures under the framework of CGP, the operations such as convolution are identified as the types of function nodes of CGP, and the evolutionary operations are designed based on Evolutionary Strategy. The experimental results show that the searched architectures are comparable with the performance of human-designed architectures. We verify the ability of domain transfer of our evolved architectures. The transfer experimental results show that the accuracy deterioration is lower than 2-5%. Finally, the ablation study identifies the Attention function as the single key function node and the linear transformations along could keep the accuracy similar with the full evolved architectures, which is worthy of investigation in the future.

ROSep 1, 2020
Intelligent Hotel ROS-based Service Robot

Yanyu Zhang, Xiu Wang, Xuan Wu et al.

With the advances of artificial intelligence (AI) technology, many studies and work have been carried out on how robots could replace human labor. In this paper, we present a ROS based intelligence hotel robot, which simplifies the check-in process. We use pioneer 3dx robot and considered different environment settings. The robot combined with Hokuyo Lidar and Kinect Xbox camera, can plan the routes accurately and reach rooms in different floors. In addition, we added an intelligent voice system which provides an assistant for the customers.

LGSep 26, 2019
A Quest for Structure: Jointly Learning the Graph Structure and Semi-Supervised Classification

Xuan Wu, Lingxiao Zhao, Leman Akoglu

Semi-supervised learning (SSL) is effectively used for numerous classification problems, thanks to its ability to make use of abundant unlabeled data. The main assumption of various SSL algorithms is that the nearby points on the data manifold are likely to share a label. Graph-based SSL constructs a graph from point-cloud data as an approximation to the underlying manifold, followed by label inference. It is no surprise that the quality of the constructed graph in capturing the essential structure of the data is critical to the accuracy of the subsequent inference step [6]. How should one construct a graph from the input point-cloud data for graph-based SSL? In this work we introduce a new, parallel graph learning framework (called PG-learn) for the graph construction step of SSL. Our solution has two main ingredients: (1) a gradient-based optimization of the edge weights (more specifically, different kernel bandwidths in each dimension) based on a validation loss function, and (2) a parallel hyperparameter search algorithm with an adaptive resource allocation scheme. In essence, (1) allows us to search around a (random) initial hyperparameter configuration for a better one with lower validation loss. Since the search space of hyperparameters is huge for high-dimensional problems, (2) empowers our gradient-based search to go through as many different initial configurations as possible, where runs for relatively unpromising starting configurations are terminated early to allocate the time for others. As such, PG-learn is a carefully-designed hybrid of random and adaptive search. Through experiments on multi-class classification problems, we show that PG-learn significantly outperforms a variety of existing graph construction schemes in accuracy (per fixed time budget for hyperparameter tuning), and scales more effectively to high dimensional problems.

LGJun 12, 2019
Coresets for Gaussian Mixture Models of Any Shape

Dan Feldman, Zahi Kfir, Xuan Wu

An $\varepsilon$-coreset for a given set $D$ of $n$ points, is usually a small weighted set, such that querying the coreset \emph{provably} yields a $(1+\varepsilon)$-factor approximation to the original (full) dataset, for a given family of queries. Using existing techniques, coresets can be maintained for streaming, dynamic (insertion/deletions), and distributed data in parallel, e.g. on a network, GPU or cloud. We suggest the first coresets that approximate the negative log-likelihood for $k$-Gaussians Mixture Models (GMM) of arbitrary shapes (ratio between eigenvalues of their covariance matrices). For example, for any input set $D$ whose coordinates are integers in $[-n^{100},n^{100}]$ and any fixed $k,d\geq 1$, the coreset size is $(\log n)^{O(1)}/\varepsilon^2$, and can be computed in time near-linear in $n$, with high probability. The optimal GMM may then be approximated quickly by learning the small coreset. Previous results [NIPS'11, JMLR'18] suggested such small coresets for the case of semi-speherical unit Gaussians, i.e., where their corresponding eigenvalues are constants between $\frac{1}{2π}$ to $2π$. Our main technique is a reduction between coresets for $k$-GMMs and projective clustering problems. We implemented our algorithms, and provide open code, and experimental results. Since our coresets are generic, with no special dependency on GMMs, we hope that they will be useful for many other functions.

LGOct 28, 2017
Wasserstein Identity Testing

Shichuan Deng, Wenzheng Li, Xuan Wu

Uniformity testing and the more general identity testing are well studied problems in distributional property testing. Most previous work focuses on testing under $L_1$-distance. However, when the support is very large or even continuous, testing under $L_1$-distance may require a huge (even infinite) number of samples. Motivated by such issues, we consider the identity testing in Wasserstein distance (a.k.a. transportation distance and earthmover distance) on a metric space (discrete or continuous). In this paper, we propose the Wasserstein identity testing problem (Identity Testing in Wasserstein distance). We obtain nearly optimal worst-case sample complexity for the problem. Moreover, for a large class of probability distributions satisfying the so-called "Doubling Condition", we provide nearly instance-optimal sample complexity.