Xudong Li

CV
h-index29
29papers
207citations
Novelty53%
AI Score56

29 Papers

93.5SYJun 3
Peer-to-Peer Cloud Service Market for Data Centers Oriented to Computation-Electricity Coordination

Yugui Liu, Yibo Ding, Xudong Li et al.

Energy-intensive data centers (DCs) have emerged as substantial and flexible loads in modern power systems, underscoring the critical need for computation-electricity coordination. Harnessing the spatio-temporal flexibility of DC workloads is a promising approach to facilitate this coordination. However, existing studies overlook the collaborative potential of computational resource sharing among geo-distributed DCs, thereby failing to fully unlock this flexibility. In this paper, a bi-level computation-electricity coordination framework is proposed to explicitly capture the bidirectional interactions between DCs and power grid. Firstly, a peer-to-peer cloud service market (P2P-CSM) for geo-distributed DCs is proposed, which enables bilateral cloud service transactions to leverage regional heterogeneities (e.g., electricity prices, cooling efficiency). Secondly, locational marginal prices are embedded into the framework to reflect network congestion and nodal price disparities. Thirdly, a dual consensus alternating direction method of multipliers (ADMM)-based decentralized algorithm is developed as the P2P market clearing algorithm, and a bisection-assisted iterative algorithm is proposed to ensure rigorous convergence of the framework. Case studies conducted on modified IEEE 30-bus system validate that the P2P-CSM achieves a win-win computation-electricity coordination: it not only increases total DC operational profit by 22.8\%, but also effectively alleviates grid congestion and yields a 3.2\% reduction in total energy consumption.

CVAug 16, 2024Code
Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation

Kaixiang Yang, Wenqi Shan, Xudong Li et al.

Multi-modal brain tumor segmentation typically involves four magnetic resonance imaging (MRI) modalities, while incomplete modalities significantly degrade performance. Existing solutions employ explicit or implicit modality adaptation, aligning features across modalities or learning a fused feature robust to modality incompleteness. They share a common goal of encouraging each modality to express both itself and the others. However, the two expression abilities are entangled as a whole in a seamless feature space, resulting in prohibitive learning burdens. In this paper, we propose DeMoSeg to enhance the modality adaptation by Decoupling the task of representing the ego and other Modalities for robust incomplete multi-modal Segmentation. The decoupling is super lightweight by simply using two convolutions to map each modality onto four feature sub-spaces. The first sub-space expresses itself (Self-feature), while the remaining sub-spaces substitute for other modalities (Mutual-features). The Self- and Mutual-features interactively guide each other through a carefully-designed Channel-wised Sparse Self-Attention (CSSA). After that, a Radiologist-mimic Cross-modality expression Relationships (RCR) is introduced to have available modalities provide Self-feature and also `lend' their Mutual-features to compensate for the absent ones by exploiting the clinical prior knowledge. The benchmark results on BraTS2020, BraTS2018 and BraTS2015 verify the DeMoSeg's superiority thanks to the alleviated modality adaptation difficulty. Concretely, for BraTS2020, DeMoSeg increases Dice by at least 0.92%, 2.95% and 4.95% on whole tumor, tumor core and enhanced tumor regions, respectively, compared to other state-of-the-arts. Codes are at https://github.com/kk42yy/DeMoSeg

CVSep 9, 2024
Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models

Xudong Li, Zihao Huang, Yan Zhang et al.

Image Quality Assessment (IQA) remains an unresolved challenge in computer vision due to complex distortions, diverse image content, and limited data availability. Existing Blind IQA (BIQA) methods largely rely on extensive human annotations, which are labor-intensive and costly due to the demanding nature of creating IQA datasets. To reduce this dependency, we propose the Gradient-Regulated Meta-Prompt IQA Framework (GRMP-IQA), designed to efficiently adapt the visual-language pre-trained model, CLIP, to IQA tasks, achieving high accuracy even with limited data. GRMP-IQA consists of two core modules: (i) Meta-Prompt Pre-training Module and (ii) Quality-Aware Gradient Regularization. The Meta Prompt Pre-training Module leverages a meta-learning paradigm to pre-train soft prompts with shared meta-knowledge across different distortions, enabling rapid adaptation to various IQA tasks. On the other hand, the Quality-Aware Gradient Regularization is designed to adjust the update gradients during fine-tuning, focusing the model's attention on quality-relevant features and preventing overfitting to semantic information. Extensive experiments on standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods under limited data setting. Notably, utilizing just 20% of the training data, GRMP-IQA is competitive with most existing fully supervised BIQA approaches.

CVJan 29
PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction

Changjian Jiang, Kerui Ren, Xudong Li et al.

Streaming reconstruction from monocular image sequences remains challenging, as existing methods typically favor either high-quality rendering or accurate geometry, but rarely both. We present PLANING, an efficient on-the-fly reconstruction framework built on a hybrid representation that loosely couples explicit geometric primitives with neural Gaussians, enabling geometry and appearance to be modeled in a decoupled manner. This decoupling supports an online initialization and optimization strategy that separates geometry and appearance updates, yielding stable streaming reconstruction with substantially reduced structural redundancy. PLANING improves dense mesh Chamfer-L2 by 18.52% over PGSR, surpasses ARTDECO by 1.31 dB PSNR, and reconstructs ScanNetV2 scenes in under 100 seconds, over 5x faster than 2D Gaussian Splatting, while matching the quality of offline per-scene optimization. Beyond reconstruction quality, the structural clarity and computational efficiency of PLANING make it well suited for a broad range of downstream applications, such as enabling large-scale scene modeling and simulation-ready environments for embodied AI. Project page: https://city-super.github.io/PLANING/ .

96.1CVApr 18
Q-DeepSight: Incentivizing Thinking with Images for Image Quality Assessment and Refinement

Xudong Li, Jiaxi Tan, Ziyin Zhou et al.

Image Quality Assessment (IQA) models are increasingly deployed as perceptual critics to guide generative models and image restoration. This role demands not only accurate scores but also actionable, localized feedback. However, current MLLM-based methods adopt a single-look, language-only paradigm, which departs from human evidence-seeking judgment and yields weakly grounded rationales, limiting their reliability for in-the-loop refinement. We propose Q-DeepSight, a think-with-image framework that emulates this human-like process. It performs interleaved Multimodal Chain-of-Thought (iMCoT) with tool-augmented evidence acquisition (e.g., crop-and-zoom) to explicitly determine where quality degrades and why. To train these long iMCoT trajectories via reinforcement learning, we introduce two techniques: Perceptual Curriculum Reward (PCR) to mitigate reward sparsity and Evidence Gradient Filtering (EGF) to improve credit assignment for visually-grounded reasoning. Q-DeepSight achieves state-of-the-art performance across diverse benchmarks, including natural, restored, and AI-generated content. Furthermore, we demonstrate its practical value with Perceptual-in-Generation (PiG), a training-free framework where Q-DeepSight's diagnoses guide iterative image enhancement, effectively closing the loop between assessment and refinement.

23.8CLMay 11
PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning

Luan Zhang, Dandan Song, Zhijing Wu et al.

Tool-integrated reasoning (TIR) enables large language models (LLMs) to enhance their capabilities by interacting with external tools, such as code interpreters (CI). Most recent studies focus on exploring various methods to equip LLMs with the ability to use tools. However, how to further boost the reasoning ability of already tool-capable LLMs at inference time remains underexplored. Improving reasoning at inference time requires no additional training and can help LLMs better leverage tools to solve problems. We observe that, during tool-capable LLM inference, both the number and the proportion of erroneous tool calls are negatively correlated with answer correctness. Moreover, erroneous tool calls are typically resolved successfully within a few subsequent turns. If not, LLMs often struggle to resolve such errors even with many additional turns. Building on the above observations, we propose PruneTIR, a rather effective yet efficient framework that enhances the tool-integrated reasoning at inference time. During LLM inference, PruneTIR prunes trajectories, resamples tool calls, and suspends tool usage through three components: Success-Triggered Pruning, Stuck-Triggered Pruning and Resampling, and Retry-Triggered Tool Suspension. These three components enable PruneTIR to mitigate the negative impact of erroneous tool calls and prevent LLMs from getting stuck in repeated failed resolution attempts, thereby improving overall LLM performance. Extensive experimental results demonstrate the effectiveness of PruneTIR, which significantly improves Pass@1 and efficiency while reducing the working context length for tool-capable LLMs.

CLJan 12
ActiShade: Activating Overshadowed Knowledge to Guide Multi-Hop Reasoning in Large Language Models

Huipeng Ma, Luan Zhang, Dandan Song et al.

In multi-hop reasoning, multi-round retrieval-augmented generation (RAG) methods typically rely on LLM-generated content as the retrieval query. However, these approaches are inherently vulnerable to knowledge overshadowing - a phenomenon where critical information is overshadowed during generation. As a result, the LLM-generated content may be incomplete or inaccurate, leading to irrelevant retrieval and causing error accumulation during the iteration process. To address this challenge, we propose ActiShade, which detects and activates overshadowed knowledge to guide large language models (LLMs) in multi-hop reasoning. Specifically, ActiShade iteratively detects the overshadowed keyphrase in the given query, retrieves documents relevant to both the query and the overshadowed keyphrase, and generates a new query based on the retrieved documents to guide the next-round iteration. By supplementing the overshadowed knowledge during the formulation of next-round queries while minimizing the introduction of irrelevant noise, ActiShade reduces the error accumulation caused by knowledge overshadowing. Extensive experiments show that ActiShade outperforms existing methods across multiple datasets and LLMs.

LGAug 16, 2019Code
SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search

Xiangxiang Chu, Bo Zhang, Qingyuan Li et al.

To discover powerful yet compact models is an important goal of neural architecture search. Previous two-stage one-shot approaches are limited by search space with a fixed depth. It seems handy to include an additional skip connection in the search space to make depths variable. However, it creates a large range of perturbation during supernet training and it has difficulty giving a confident ranking for subnetworks. In this paper, we discover that skip connections bring about significant feature inconsistency compared with other operations, which potentially degrades the supernet performance. Based on this observation, we tackle the problem by imposing an equivariant learnable stabilizer to homogenize such disparities. Experiments show that our proposed stabilizer helps to improve the supernet's convergence as well as ranking performance. With an evolutionary search backend that incorporates the stabilized supernet as an evaluator, we derive a family of state-of-the-art architectures, the SCARLET series of several depths, especially SCARLET-A obtains 76.9% top-1 accuracy on ImageNet. Code is available at https://github.com/xiaomi-automl/ScarletNAS.

CVDec 11, 2023
Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

Xudong Li, Timin Gao, Runze Hu et al.

The current state-of-the-art No-Reference Image Quality Assessment (NR-IQA) methods typically rely on feature extraction from upstream semantic backbone networks, assuming that all extracted features are relevant. However, we make a key observation that not all features are beneficial, and some may even be harmful, necessitating careful selection. Empirically, we find that many image pairs with small feature spatial distances can have vastly different quality scores, indicating that the extracted features may contain a significant amount of quality-irrelevant noise. To address this issue, we propose a Quality-Aware Feature Matching IQA Metric (QFM-IQM) that employs an adversarial perspective to remove harmful semantic noise features from the upstream task. Specifically, QFM-IQM enhances the semantic noise distinguish capabilities by matching image pairs with similar quality scores but varying semantic features as adversarial semantic noise and adaptively adjusting the upstream task's features by reducing sensitivity to adversarial noise perturbation. Furthermore, we utilize a distillation framework to expand the dataset and improve the model's generalization ability. Our approach achieves superior performance to the state-of-the-art NR-IQA methods on eight standard IQA datasets.

CVJan 22, 2024
Feature Denoising Diffusion Model for Blind Image Quality Assessment

Xudong Li, Jingyuan Zheng, Runze Hu et al.

Blind Image Quality Assessment (BIQA) aims to evaluate image quality in line with human perception, without reference benchmarks. Currently, deep learning BIQA methods typically depend on using features from high-level tasks for transfer learning. However, the inherent differences between BIQA and these high-level tasks inevitably introduce noise into the quality-aware features. In this paper, we take an initial step towards exploring the diffusion model for feature denoising in BIQA, namely Perceptual Feature Diffusion for IQA (PFD-IQA), which aims to remove noise from quality-aware features. Specifically, (i) We propose a {Perceptual Prior Discovery and Aggregation module to establish two auxiliary tasks to discover potential low-level features in images that are used to aggregate perceptual text conditions for the diffusion model. (ii) We propose a Perceptual Prior-based Feature Refinement strategy, which matches noisy features to predefined denoising trajectories and then performs exact feature denoising based on text conditions. Extensive experiments on eight standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods, i.e., achieving the PLCC values of 0.935 ( vs. 0.905 in KADID) and 0.922 ( vs. 0.894 in LIVEC).

RONov 19, 2024
AsynEIO: Asynchronous Monocular Event-Inertial Odometry Using Gaussian Process Regression

Zhixiang Wang, Xudong Li, Yizhai Zhang et al.

Event cameras, when combined with inertial sensors, show significant potential for motion estimation in challenging scenarios, such as high-speed maneuvers and low-light environments. There are many methods for producing such estimations, but most boil down to a synchronous discrete-time fusion problem. However, the asynchronous nature of event cameras and their unique fusion mechanism with inertial sensors remain underexplored. In this paper, we introduce a monocular event-inertial odometry method called AsynEIO, designed to fuse asynchronous event and inertial data within a unified Gaussian Process (GP) regression framework. Our approach incorporates an event-driven frontend that tracks feature trajectories directly from raw event streams at a high temporal resolution. These tracked feature trajectories, along with various inertial factors, are integrated into the same GP regression framework to enable asynchronous fusion. With deriving analytical residual Jacobians and noise models, our method constructs a factor graph that is iteratively optimized and pruned using a sliding-window optimizer. Comparative assessments highlight the performance of different inertial fusion strategies, suggesting optimal choices for varying conditions. Experimental results on both public datasets and our own event-inertial sequences indicate that AsynEIO outperforms existing methods, especially in high-speed and low-illumination scenarios.

CVMay 28, 2025
Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs

Xudong Li, Mengdan Zhang, Peixian Chen et al.

Multi-modal Large Language Models (MLLMs) excel at single-image tasks but struggle with multi-image understanding due to cross-modal misalignment, leading to hallucinations (context omission, conflation, and misinterpretation). Existing methods using Direct Preference Optimization (DPO) constrain optimization to a solitary image reference within the input sequence, neglecting holistic context modeling. We propose Context-to-Cue Direct Preference Optimization (CcDPO), a multi-level preference optimization framework that enhances per-image perception in multi-image settings by zooming into visual clues -- from sequential context to local details. It features: (i) Context-Level Optimization : Re-evaluates cognitive biases underlying MLLMs' multi-image context comprehension and integrates a spectrum of low-cost global sequence preferences for bias mitigation. (ii) Needle-Level Optimization : Directs attention to fine-grained visual details through region-targeted visual prompts and multimodal preference supervision. To support scalable optimization, we also construct MultiScope-42k, an automatically generated dataset with high-quality multi-level preference pairs. Experiments show that CcDPO significantly reduces hallucinations and yields consistent performance gains across general single- and multi-image tasks.

LGSep 20, 2025
Near-Optimal Sample Complexity Bounds for Constrained Average-Reward MDPs

Yukuan Wei, Xudong Li, Lin F. Yang

Recent advances have significantly improved our understanding of the sample complexity of learning in average-reward Markov decision processes (AMDPs) under the generative model. However, much less is known about the constrained average-reward MDP (CAMDP), where policies must satisfy long-run average constraints. In this work, we address this gap by studying the sample complexity of learning an $ε$-optimal policy in CAMDPs under a generative model. We propose a model-based algorithm that operates under two settings: (i) relaxed feasibility, which allows small constraint violations, and (ii) strict feasibility, where the output policy satisfies the constraint. We show that our algorithm achieves sample complexities of $\tilde{O}\left(\frac{S A (B+H)}{ ε^2}\right)$ and $\tilde{O} \left(\frac{S A (B+H)}{ε^2 ζ^2} \right)$ under the relaxed and strict feasibility settings, respectively. Here, $ζ$ is the Slater constant indicating the size of the feasible region, $H$ is the span bound of the bias function, and $B$ is the transient time bound. Moreover, a matching lower bound of $\tildeΩ\left(\frac{S A (B+H)}{ ε^2ζ^2}\right)$ for the strict feasibility case is established, thus providing the first minimax-optimal bounds for CAMDPs. Our results close the theoretical gap in understanding the complexity of constrained average-reward MDPs.

NIMay 12, 2025
Channel Fingerprint Construction for Massive MIMO: A Deep Conditional Generative Approach

Zhenzhou Jin, Li You, Xudong Li et al.

Accurate channel state information (CSI) acquisition for massive multiple-input multiple-output (MIMO) systems is essential for future mobile communication networks. Channel fingerprint (CF), also referred to as channel knowledge map, is a key enabler for intelligent environment-aware communication and can facilitate CSI acquisition. However, due to the cost limitations of practical sensing nodes and test vehicles, the resulting CF is typically coarse-grained, making it insufficient for wireless transceiver design. In this work, we introduce the concept of CF twins and design a conditional generative diffusion model (CGDM) with strong implicit prior learning capabilities as the computational core of the CF twin to establish the connection between coarse- and fine-grained CFs. Specifically, we employ a variational inference technique to derive the evidence lower bound (ELBO) for the log-marginal distribution of the observed fine-grained CF conditioned on the coarse-grained CF, enabling the CGDM to learn the complicated distribution of the target data. During the denoising neural network optimization, the coarse-grained CF is introduced as side information to accurately guide the conditioned generation of the CGDM. To make the proposed CGDM lightweight, we further leverage the additivity of network layers and introduce a one-shot pruning approach along with a multi-objective knowledge distillation technique. Experimental results show that the proposed approach exhibits significant improvement in reconstruction performance compared to the baselines. Additionally, zero-shot testing on reconstruction tasks with different magnification factors further demonstrates the scalability and generalization ability of the proposed approach.

CVNov 13, 2024
Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment

Zihao Huang, Xudong Li, Bohan Fu et al.

Blind image quality assessment (BIQA) serves as a fundamental task in computer vision, yet it often fails to consistently align with human subjective perception. Recent advances show that multi-scale evaluation strategies are promising due to their ability to replicate the hierarchical structure of human vision. However, the effectiveness of these strategies is limited by a lack of understanding of how different image scales influence perceived quality. This paper addresses two primary challenges: the significant redundancy of information across different scales, and the confusion caused by combining features from these scales, which may vary widely in quality. To this end, a new multi-scale BIQA framework is proposed, namely Contrast-Constrained Scale-Focused IQA Framework (CSFIQA). CSFIQA features a selective focus attention mechanism to minimize information redundancy and highlight critical quality-related information. Additionally, CSFIQA includes a scale-level contrastive learning module equipped with a noise sample matching mechanism to identify quality discrepancies across the same image content at different scales. By exploring the intrinsic relationship between image scales and the perceived quality, the proposed CSFIQA achieves leading performance on eight benchmark datasets, e.g., achieving SRCC values of 0.967 (versus 0.947 in CSIQ) and 0.905 (versus 0.876 in LIVEC).

CVMar 27, 2024
$\mathrm{F^2Depth}$: Self-supervised Indoor Monocular Depth Estimation via Optical Flow Consistency and Feature Map Synthesis

Xiaotong Guo, Huijie Zhao, Shuwei Shao et al.

Self-supervised monocular depth estimation methods have been increasingly given much attention due to the benefit of not requiring large, labelled datasets. Such self-supervised methods require high-quality salient features and consequently suffer from severe performance drop for indoor scenes, where low-textured regions dominant in the scenes are almost indiscriminative. To address the issue, we propose a self-supervised indoor monocular depth estimation framework called $\mathrm{F^2Depth}$. A self-supervised optical flow estimation network is introduced to supervise depth learning. To improve optical flow estimation performance in low-textured areas, only some patches of points with more discriminative features are adopted for finetuning based on our well-designed patch-based photometric loss. The finetuned optical flow estimation network generates high-accuracy optical flow as a supervisory signal for depth estimation. Correspondingly, an optical flow consistency loss is designed. Multi-scale feature maps produced by finetuned optical flow estimation network perform warping to compute feature map synthesis loss as another supervisory signal for depth learning. Experimental results on the NYU Depth V2 dataset demonstrate the effectiveness of the framework and our proposed losses. To evaluate the generalization ability of our $\mathrm{F^2Depth}$, we collect a Campus Indoor depth dataset composed of approximately 1500 points selected from 99 images in 18 scenes. Zero-shot generalization experiments on 7-Scenes dataset and Campus Indoor achieve $δ_1$ accuracy of 75.8% and 76.0% respectively. The accuracy results show that our model can generalize well to monocular images captured in unknown indoor scenes.

IVDec 13, 2023
Projective Parallel Single-Pixel Imaging: 3D Structured Light Scanning Under Global Illumination

Yuxi Li, Hongzhi Jiang, Huijie Zhao et al.

We present projective parallel single-pixel imaging (pPSI), a 3D photography method that provides a robust and efficient way to analyze the light transport behavior and enables separation of light effect due to global illumination, thereby achieving 3D structured light scanning under global illumination. The light transport behavior is described by the light transport coefficients (LTC), which contain complete information for a projector camera pair, and is a 4D data set. However, the capture of LTC is generally time consuming. The 4D LTC in pPSI are reduced to projection functions, thereby enabling a highly efficient data capture process. We introduce the local maximum constraint, which provides constraint for the location of candidate correspondence matching points when projections are captured. Local slice extension (LSE) method is introduced to accelerate the capture of projection functions. Optimization is conducted for pPSI under several situations. The number of projection functions required for pPSI is optimized and the influence of capture ratio in LSE on the accuracy of the correspondence matching points is investigated. Discussions and experiments include two typical kinds of global illuminations: inter-reflections and subsurface scattering. The proposed method is validated with several challenging scenarios, and outperforms the state-of-the-art methods.

LGFeb 16, 2022
Data-Driven Minimax Optimization with Expectation Constraints

Shuoguang Yang, Xudong Li, Guanghui Lan

Attention to data-driven optimization approaches, including the well-known stochastic gradient descent method, has grown significantly over recent decades, but data-driven constraints have rarely been studied, because of the computational challenges of projections onto the feasible set defined by these hard constraints. In this paper, we focus on the non-smooth convex-concave stochastic minimax regime and formulate the data-driven constraints as expectation constraints. The minimax expectation constrained problem subsumes a broad class of real-world applications, including two-player zero-sum game and data-driven robust optimization. We propose a class of efficient primal-dual algorithms to tackle the minimax expectation-constrained problem, and show that our algorithms converge at the optimal rate of $\mathcal{O}(\frac{1}{\sqrt{N}})$. We demonstrate the practical efficiency of our algorithms by conducting numerical experiments on large-scale real-world applications.

ITDec 28, 2021
Robust Security Analysis Based on Random Geometry Theory for Satellite-Terrestrial-Vehicle Network

Xudong Li, Ye Fan, Rugui Yao et al.

Driven by B5G and 6G technologies, multi-network fusion is an indispensable tendency for future communications. In this paper, we focus on and analyze the \emph{security performance} (SP) of the \emph{satellite-terrestrial downlink transmission} (STDT). Here, the STDT is composed of a satellite network and a vehicular network with a legitimate mobile receiver and an mobile eavesdropper distributing. To theoretically analyze the SP of this system from the perspective of mobile terminals better, the random geometry theory is adopted, which assumes that both terrestrial vehicles are distributed stochastically in one beam of the satellite. Furthermore, based on this theory, the closed-form analytical expressions for two crucial and specific indicators in the STDT are derived, respectively, the secrecy outage probability and the ergodic secrecy capacity. Additionally, several related variables restricting the SP of the STDT are discussed, and specific schemes are presented to enhance the SP. Then, the asymptotic property is investigated in the high signal-to-noise ratio scenario, and accurate and asymptotic closed-form expressions are given. Finally, simulation results show that, under the precondition of guaranteeing the reliability of the STDT, the asymptotic solutions outperform the corresponding accurate results significantly in the effectiveness.

OCOct 23, 2021
On Geometric Connections of Embedded and Quotient Geometries in Riemannian Fixed-rank Matrix Optimization

Yuetian Luo, Xudong Li, Anru R. Zhang

In this paper, we propose a general procedure for establishing the geometric landscape connections of a Riemannian optimization problem under the embedded and quotient geometries. By applying the general procedure to the fixed-rank positive semidefinite (PSD) and general matrix optimization, we establish an exact Riemannian gradient connection under two geometries at every point on the manifold and sandwich inequalities between the spectra of Riemannian Hessians at Riemannian first-order stationary points (FOSPs). These results immediately imply an equivalence on the sets of Riemannian FOSPs, Riemannian second-order stationary points (SOSPs), and strict saddles of fixed-rank matrix optimization under the embedded and the quotient geometries. To the best of our knowledge, this is the first geometric landscape connection between the embedded and the quotient geometries for fixed-rank matrix optimization and it provides a concrete example of how these two geometries are connected in Riemannian optimization. In addition, the effects of the Riemannian metric and quotient structure on the landscape connection are discussed. We also observe an algorithmic connection between two geometries with some specific Riemannian metrics in fixed-rank matrix optimization: there is an equivalence between gradient flows under two geometries with shared spectra of Riemannian Hessians. A number of novel ideas and technical ingredients including a unified treatment for different Riemannian metrics, novel metrics for the Stiefel manifold, and new horizontal space representations under quotient geometries are developed to obtain our results. The results in this paper deepen our understanding of geometric and algorithmic connections of Riemannian optimization under different Riemannian geometries and provide a few new theoretical insights to unanswered questions in the literature.

OCAug 3, 2021
Nonconvex Factorization and Manifold Formulations are Almost Equivalent in Low-rank Matrix Optimization

Yuetian Luo, Xudong Li, Anru R. Zhang

In this paper, we consider the geometric landscape connection of the widely studied manifold and factorization formulations in low-rank positive semidefinite (PSD) and general matrix optimization. We establish a sandwich relation on the spectrum of Riemannian and Euclidean Hessians at first-order stationary points (FOSPs). As a result of that, we obtain an equivalence on the set of FOSPs, second-order stationary points (SOSPs) and strict saddles between the manifold and the factorization formulations. In addition, we show the sandwich relation can be used to transfer more quantitative geometric properties from one formulation to another. Similarities and differences in the landscape connection under the PSD case and the general case are discussed. To the best of our knowledge, this is the first geometric landscape connection between the manifold and the factorization formulations for handling rank constraints, and it provides a geometric explanation for the similar empirical performance of factorization and manifold approaches in low-rank matrix optimization observed in the literature. In the general low-rank matrix optimization, the landscape connection of two factorization formulations (unregularized and regularized ones) is also provided. By applying these geometric landscape connections, in particular, the sandwich relation, we are able to solve unanswered questions in literature and establish stronger results in the applications on geometric analysis of phase retrieval, well-conditioned low-rank matrix optimization, and the role of regularization in factorization arising from machine learning and signal processing.

ROApr 7, 2021
Few-Shot Meta-Learning on Point Cloud for Semantic Segmentation

Xudong Li, Li Feng, Lei Li et al.

The promotion of construction robots can solve the problem of human resource shortage and improve the quality of decoration. To help the construction robots obtain environmental information, we need to use 3D point cloud, which is widely used in robotics, autonomous driving, and so on. With a good understanding of environmental information, construction robots can work better. However, the dynamic changes of 3D point cloud data may bring difficulties for construction robots to understand environmental information, such as when construction robots renovate houses. The paper proposes a semantic segmentation method for point cloud based on meta-learning. The method includes a basic learning module and a meta-learning module. The basic learning module is responsible for learning data features and evaluating the model, while the meta-learning module is responsible for updating the parameters of the model and improving the model generalization ability. In our work, we pioneered the method of producing datasets for meta-learning in 3D scenes, as well as demonstrated that the Model-Agnostic Meta-Learning (MAML) algorithm can be applied to process 3D point cloud data. At the same time, experiments show that our method can allow the model to be quickly applied to new environments with a few samples. Our method has important applications.

OCNov 17, 2020
Recursive Importance Sketching for Rank Constrained Least Squares: Algorithms and High-order Convergence

Yuetian Luo, Wen Huang, Xudong Li et al.

In this paper, we propose {\it \underline{R}ecursive} {\it \underline{I}mportance} {\it \underline{S}ketching} algorithm for {\it \underline{R}ank} constrained least squares {\it \underline{O}ptimization} (RISRO). The key step of RISRO is recursive importance sketching, a new sketching framework based on deterministically designed recursive projections, which significantly differs from the randomized sketching in the literature \citep{mahoney2011randomized,woodruff2014sketching}. Several existing algorithms in the literature can be reinterpreted under this new sketching framework and RISRO offers clear advantages over them. RISRO is easy to implement and computationally efficient, where the core procedure in each iteration is to solve a dimension-reduced least squares problem. We establish the local quadratic-linear and quadratic rate of convergence for RISRO under some mild conditions. We also discover a deep connection of RISRO to the Riemannian Gauss-Newton algorithm on fixed rank matrices. The effectiveness of RISRO is demonstrated in two applications in machine learning and statistics: low-rank matrix trace regression and phase retrieval. Simulation studies demonstrate the superior numerical performance of RISRO.

ROFeb 27, 2020
Exploration-efficient Deep Reinforcement Learning with Demonstration Guidance for Robot Control

Ke Lin, Liang Gong, Xudong Li et al.

Although deep reinforcement learning (DRL) algorithms have made important achievements in many control tasks, they still suffer from the problems of sample inefficiency and unstable training process, which are usually caused by sparse rewards. Recently, some reinforcement learning from demonstration (RLfD) methods have shown to be promising in overcoming these problems. However, they usually require considerable demonstrations. In order to tackle these challenges, on the basis of the SAC algorithm we propose a sample efficient DRL-EG (DRL with efficient guidance) algorithm, in which a discriminator D(s) and a guider G(s) are modeled by a small number of expert demonstrations. The discriminator will determine the appropriate guidance states and the guider will guide agents to better exploration in the training phase. Empirical evaluation results from several continuous control tasks verify the effectiveness and performance improvements of our method over other RL and RLfD counterparts. Experiments results also show that DRL-EG can help the agent to escape from a local optimum.

LGFeb 19, 2020
Neural Architecture Search For Fault Diagnosis

Xudong Li, Yang Hu, Jianhua Zheng et al.

Data-driven methods have made great progress in fault diagnosis, especially deep learning method. Deep learning is suitable for processing big data, and has a strong feature extraction ability to realize end-to-end fault diagnosis systems. However, designing neural network architecture requires rich professional knowledge and debugging experience, and a lot of experiments are needed to screen models and hyperparameters, increasing the difficulty of developing deep learning models. Frortunately, neural architecture search (NAS) is developing rapidly, and is becoming one of the next directions for deep learning. In this paper, we proposed a NAS method for fault diagnosis using reinforcement learning. A recurrent neural network is used as an agent to generate network architecture. The accuracy of the generated network on the validation dataset is fed back to the agent as a reward, and the parameters of the agent are updated through the strategy gradient algorithm. We use PHM 2009 Data Challenge gearbox dataset to prove the effectiveness of proposed method, and obtain state-of-the-art results compared with other artificial designed network structures. To author's best knowledge, it's the first time that NAS has been applied in fault diagnosis.

LGJan 16, 2020
MixPath: A Unified Approach for One-shot Neural Architecture Search

Xiangxiang Chu, Shun Lu, Xudong Li et al.

Blending multiple convolutional kernels is proved advantageous in neural architecture design. However, current two-stage neural architecture search methods are mainly limited to single-path search spaces. How to efficiently search models of multi-path structures remains a difficult problem. In this paper, we are motivated to train a one-shot multi-path supernet to accurately evaluate the candidate architectures. Specifically, we discover that in the studied search spaces, feature vectors summed from multiple paths are nearly multiples of those from a single path. Such disparity perturbs the supernet training and its ranking ability. Therefore, we propose a novel mechanism called Shadow Batch Normalization (SBN) to regularize the disparate feature statistics. Extensive experiments prove that SBNs are capable of stabilizing the optimization and improving ranking performance. We call our unified multi-path one-shot approach as MixPath, which generates a series of models that achieve state-of-the-art results on ImageNet.

MEJun 28, 2019
Learning Markov models via low-rank optimization

Ziwei Zhu, Xudong Li, Mengdi Wang et al.

Modeling unknown systems from data is a precursor of system optimization and sequential decision making. In this paper, we focus on learning a Markov model from a single trajectory of states. Suppose that the transition model has a small rank despite of having a large state space, meaning that the system admits a low-dimensional latent structure. We show that one can estimate the full transition model accurately using a trajectory of length that is proportional to the total number of states. We propose two maximum likelihood estimation methods: a convex approach with nuclear-norm regularization and a nonconvex approach with rank constraint. We explicitly derive the statistical rates of both estimators in terms of the Kullback-Leiber divergence and the $\ell_2$ error and also establish a minimax lower bound to assess the tightness of these rates. For computing the nonconvex estimator, we develop a novel DC (difference of convex function) programming algorithm that starts with the convex M-estimator and then successively refines the solution till convergence. Empirical experiments demonstrate consistent superiority of the nonconvex estimator over the convex one.

MLApr 3, 2018
Estimation of Markov Chain via Rank-Constrained Likelihood

Xudong Li, Mengdi Wang, Anru Zhang

This paper studies the estimation of low-rank Markov chains from empirical trajectories. We propose a non-convex estimator based on rank-constrained likelihood maximization. Statistical upper bounds are provided for the Kullback-Leiber divergence and the $\ell_2$ risk between the estimator and the true transition matrix. The estimator reveals a compressed state space of the Markov chain. We also develop a novel DC (difference of convex function) programming algorithm to tackle the rank-constrained non-smooth optimization problem. Convergence results are established. Experiments show that the proposed estimator achieves better empirical performance than other popular approaches.

NAMay 23, 2017
A block symmetric Gauss-Seidel decomposition theorem for convex composite quadratic programming and its applications

Xudong Li, Defeng Sun, Kim-Chuan Toh

For a symmetric positive semidefinite linear system of equations $\mathcal{Q} {\bf x} = {\bf b}$, where ${\bf x} = (x_1,\ldots,x_s)$ is partitioned into $s$ blocks, with $s \geq 2$, we show that each cycle of the classical block symmetric Gauss-Seidel (block sGS) method exactly solves the associated quadratic programming (QP) problem but added with an extra proximal term of the form $\frac{1}{2} \| {\bf x}-{\bf x}^k \|_{\mathcal T}^2$, where ${\mathcal T}$ is a symmetric positive semidefinite matrix related to the sGS decomposition and ${\bf x}^k$ is the previous iterate. By leveraging on such a connection to optimization, we are able to extend the result (which we name as the block sGS decomposition theorem) for solving a convex composite QP (CCQP) with an additional possibly nonsmooth term in $x_1$, i.e., $\min\{ p(x_1) + \frac{1}{2}\langle {\bf x},\, \mathcal{Q} {\bf x} \rangle -\langle {\bf b},\, {\bf x}\rangle\}$, where $p(\cdot)$ is a proper closed convex function. Based on the block sGS decomposition theorem, we are able to extend the classical block sGS method to solve a CCQP. In addition, our extended block sGS method has the flexibility of allowing for inexact computation in each step of the block sGS cycle. At the same time, we can also accelerate the inexact block sGS method to achieve an iteration complexity of $O(1/k^2)$ after performing $k$ block sGS cycles. As a {fundamental} building block, the block sGS decomposition theorem has played a key role in various recently developed algorithms such as the inexact semiproximal {ALM/ADMM} for linearly constrained multi-block convex composite conic programming (CCCP), and the accelerated block coordinate descent method for multi-block CCCP.