h-index9
37papers
1,392citations
Novelty57%
AI Score49

37 Papers

CVSep 27, 2024Code
LW2G: Learning Whether to Grow for Prompt-based Continual Learning

Qian Feng, Da-wei Zhou, Hanbin Zhao et al.

Recent Prompt-based Continual learning (PCL) has achieved remarkable performance with pre-trained models. These approaches expand a prompt pool by adding a new set of prompts while learning and select the correct set during inference. Previous studies have revealed that learning task-wised prompt sets individually and low selection accuracy pose challenges to the performance of PCL. In this paper, we propose a plug-in method, $\textbf{L}$earning $\textbf{W}$hether $\textbf{t}$o $\textbf{G}$row $\textbf{(LW2G)}$, which leverages the disparities between tasks to form an effective and efficient prompt sets pool, thereby achieving intra-task knowledge sharing and cooperation and avoiding the unbounded increase in the cost of the prompt pool. Specifically, a shared set is utilized when several tasks share certain commonalities, and a new set is added when there are significant differences between the new and previous tasks. To achieve this, we develop a metric called Hinder Forward Capability (HFC) to measure the hindrance imposed on learning new tasks by surgically modifying the original gradient onto the orthogonal complement of the old feature space. With HFC, an automated scheme, Dynamic Growing Approach, adaptively learns whether to grow with a dynamic threshold. Furthermore, we design a gradient-based constraint to ensure consistency between the updating prompts and pre-trained knowledge. Extensive experiments show the effectiveness of our method. Code is available at https://github.com/RAIAN08/LW2G.

CVJul 4, 2024Code
PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer

Qian Feng, Hanbin Zhao, Chao Zhang et al.

Incremental Learning (IL) aims to learn deep models on sequential tasks continually, where each new task includes a batch of new classes and deep models have no access to task-ID information at the inference time. Recent vast pre-trained models (PTMs) have achieved outstanding performance by prompt technique in practical IL without the old samples (rehearsal-free) and with a memory constraint (memory-constrained): Prompt-extending and Prompt-fixed methods. However, prompt-extending methods need a large memory buffer to maintain an ever-expanding prompt pool and meet an extra challenging prompt selection problem. Prompt-fixed methods only learn a single set of prompts on one of the incremental tasks and can not handle all the incremental tasks effectively. To achieve a good balance between the memory cost and the performance on all the tasks, we propose a Parameter-Efficient Cross-Task Prompt (PECTP) framework with Prompt Retention Module (PRM) and classifier Head Retention Module (HRM). To make the final learned prompts effective on all incremental tasks, PRM constrains the evolution of cross-task prompts' parameters from Outer Prompt Granularity and Inner Prompt Granularity. Besides, we employ HRM to inherit old knowledge in the previously learned classifier heads to facilitate the cross-task prompts' generalization ability. Extensive experiments show the effectiveness of our method. The source codes will be available at \url{https://github.com/RAIAN08/PECTP}.

CVSep 9, 2024Code
TextToucher: Fine-Grained Text-to-Touch Generation

Jiahang Tu, Hao Fu, Fengyu Yang et al.

Tactile sensation plays a crucial role in the development of multi-modal large models and embodied intelligence. To collect tactile data with minimal cost as possible, a series of studies have attempted to generate tactile images by vision-to-touch image translation. However, compared to text modality, visual modality-driven tactile generation cannot accurately depict human tactile sensation. In this work, we analyze the characteristics of tactile images in detail from two granularities: object-level (tactile texture, tactile shape), and sensor-level (gel status). We model these granularities of information through text descriptions and propose a fine-grained Text-to-Touch generation method (TextToucher) to generate high-quality tactile samples. Specifically, we introduce a multimodal large language model to build the text sentences about object-level tactile information and employ a set of learnable text prompts to represent the sensor-level tactile information. To better guide the tactile generation process with the built text information, we fuse the dual grains of text information and explore various dual-grain text conditioning methods within the diffusion transformer architecture. Furthermore, we propose a Contrastive Text-Touch Pre-training (CTTP) metric to precisely evaluate the quality of text-driven generated tactile data. Extensive experiments demonstrate the superiority of our TextToucher method. The source codes will be available at \url{https://github.com/TtuHamg/TextToucher}.

CVJul 22, 2024Code
DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving

Jiahang Tu, Wei Ji, Hanbin Zhao et al.

In autonomous driving, deep models have shown remarkable performance across various visual perception tasks with the demand of high-quality and huge-diversity training datasets. Such datasets are expected to cover various driving scenarios with adverse weather, lighting conditions and diverse moving objects. However, manually collecting these data presents huge challenges and expensive cost. With the rapid development of large generative models, we propose DriveDiTFit, a novel method for efficiently generating autonomous Driving data by Fine-tuning pre-trained Diffusion Transformers (DiTs). Specifically, DriveDiTFit utilizes a gap-driven modulation technique to carefully select and efficiently fine-tune a few parameters in DiTs according to the discrepancy between the pre-trained source data and the target driving data. Additionally, DriveDiTFit develops an effective weather and lighting condition embedding module to ensure diversity in the generated data, which is initialized by a nearest-semantic-similarity initialization approach. Through progressive tuning scheme to refined the process of detail generation in early diffusion process and enlarging the weights corresponding to small objects in training loss, DriveDiTFit ensures high-quality generation of small moving objects in the generated data. Extensive experiments conducted on driving datasets confirm that our method could efficiently produce diverse real driving data. The source codes will be available at https://github.com/TtuHamg/DriveDiTFit.

CVApr 8, 2022
Vision Transformers for Single Image Dehazing

Yuda Song, Zhuqing He, Hui Qian et al.

Image dehazing is a representative low-level vision task that estimates latent haze-free images from hazy images. In recent years, convolutional neural network-based methods have dominated image dehazing. However, vision Transformers, which has recently made a breakthrough in high-level vision tasks, has not brought new dimensions to image dehazing. We start with the popular Swin Transformer and find that several of its key designs are unsuitable for image dehazing. To this end, we propose DehazeFormer, which consists of various improvements, such as the modified normalization layer, activation function, and spatial information aggregation scheme. We train multiple variants of DehazeFormer on various datasets to demonstrate its effectiveness. Specifically, on the most frequently used SOTS indoor set, our small model outperforms FFA-Net with only 25% #Param and 5% computational cost. To the best of our knowledge, our large model is the first method with the PSNR over 40 dB on the SOTS indoor set, dramatically outperforming the previous state-of-the-art methods. We also collect a large-scale realistic remote sensing dehazing dataset for evaluating the method's capability to remove highly non-homogeneous haze.

CVSep 23, 2022
Rethinking Performance Gains in Image Dehazing Networks

Yuda Song, Yang Zhou, Hui Qian et al.

Image dehazing is an active topic in low-level vision, and many image dehazing networks have been proposed with the rapid development of deep learning. Although these networks' pipelines work fine, the key mechanism to improving image dehazing performance remains unclear. For this reason, we do not target to propose a dehazing network with fancy modules; rather, we make minimal modifications to popular U-Net to obtain a compact dehazing network. Specifically, we swap out the convolutional blocks in U-Net for residual blocks with the gating mechanism, fuse the feature maps of main paths and skip connections using the selective kernel, and call the resulting U-Net variant gUNet. As a result, with a significantly reduced overhead, gUNet is superior to state-of-the-art methods on multiple image dehazing datasets. Finally, we verify these key designs to the performance gain of image dehazing networks through extensive ablation studies.

NAApr 23, 2023
Accelerated Stochastic ADMM with Variance Reduction

Chao Zhang, Zebang Shen, Hui Qian et al.

Alternating Direction Method of Multipliers (ADMM) is a popular method for solving large-scale Machine Learning problems. Stochastic ADMM was proposed to reduce the per iteration computational complexity, which is more suitable for big data problems. Recently, variance reduction techniques have been integrated with stochastic ADMM in order to get a faster convergence rate, such as SAG-ADMM and SVRG-ADMM. However, their convergence rate is still suboptimal w.r.t the smoothness constant. In this paper, we propose an accelerated stochastic ADMM algorithm with variance reduction, which enjoys a faster convergence than all the existing stochastic ADMM algorithms. We theoretically analyse its convergence rate and show its dependence on the smoothness constant is optimal. We also empirically validate its effectiveness and show its priority over other stochastic ADMM algorithms.

LGJun 11, 2023
PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm

Wensong Bai, Chao Zhang, Yichao Fu et al.

In this paper, we propose the first fully push-forward-based distributional reinforcement learning algorithm, named PACER, which consists of a distributional critic, a stochastic actor and a sample-based encourager. Specifically, the push-forward operator is leveraged in both the critic and actor to model the return distributions and stochastic policies respectively, enabling them with equal modeling capability and thus enhancing the synergetic performance. Since it is infeasible to obtain the density function of the push-forward policies, novel sample-based regularizers are integrated in the encourager to incentivize efficient exploration and alleviate the risk of trapping into local optima. Moreover, a sample-based stochastic utility value policy gradient is established for the push-forward policy update, which circumvents the explicit demand of the policy density function in existing REINFORCE-based stochastic policy gradient. As a result, PACER fully utilizes the modeling capability of the push-forward operator and is able to explore a broader class of the policy space, compared with limited policy classes used in existing distributional actor critic algorithms (i.e. Gaussians). We validate the critical role of each component in our algorithm with extensive empirical studies. Experimental results demonstrate the superiority of our algorithm over the state-of-the-art.

LGApr 23, 2023
An Asynchronous Decentralized Algorithm for Wasserstein Barycenter Problem

Chao Zhang, Hui Qian, Jiahao Xie

Wasserstein Barycenter Problem (WBP) has recently received much attention in the field of artificial intelligence. In this paper, we focus on the decentralized setting for WBP and propose an asynchronous decentralized algorithm (A$^2$DWB). A$^2$DWB is induced by a novel stochastic block coordinate descent method to optimize the dual of entropy regularized WBP. To our knowledge, A$^2$DWB is the first asynchronous decentralized algorithm for WBP. Unlike its synchronous counterpart, it updates local variables in a manner that only relies on the stale neighbor information, which effectively alleviate the waiting overhead, and thus substantially improve the time efficiency. Empirical results validate its superior performance compared to the latest synchronous algorithm.

LGApr 23, 2023
Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization

Zebang Shen, Hui Qian, Tongzhou Mu et al.

Nowadays, algorithms with fast convergence, small memory footprints, and low per-iteration complexity are particularly favorable for artificial intelligence applications. In this paper, we propose a doubly stochastic algorithm with a novel accelerating multi-momentum technique to solve large scale empirical risk minimization problem for learning tasks. While enjoying a provably superior convergence rate, in each iteration, such algorithm only accesses a mini batch of samples and meanwhile updates a small block of variable coordinates, which substantially reduces the amount of memory reference when both the massive sample size and ultra-high dimensionality are involved. Empirical studies on huge scale datasets are conducted to illustrate the efficiency of our method in practice.

CVMar 15, 2022
Multi-Curve Translator for High-Resolution Photorealistic Image Translation

Yuda Song, Hui Qian, Xin Du

The dominant image-to-image translation methods are based on fully convolutional networks, which extract and translate an image's features and then reconstruct the image. However, they have unacceptable computational costs when working with high-resolution images. To this end, we present the Multi-Curve Translator (MCT), which not only predicts the translated pixels for the corresponding input pixels but also for their neighboring pixels. And if a high-resolution image is downsampled to its low-resolution version, the lost pixels are the remaining pixels' neighboring pixels. So MCT makes it possible to feed the network only the downsampled image to perform the mapping for the full-resolution image, which can dramatically lower the computational cost. Besides, MCT is a plug-in approach that utilizes existing base models and requires only replacing their output layers. Experiments demonstrate that the MCT variants can process 4K images in real-time and achieve comparable or even better performance than the base models on various photorealistic image-to-image translation tasks.

LGJun 29, 2023
Towards Optimal Randomized Strategies in Adversarial Example Game

Jiahao Xie, Chao Zhang, Weijie Liu et al.

The vulnerability of deep neural network models to adversarial example attacks is a practical challenge in many artificial intelligence applications. A recent line of work shows that the use of randomization in adversarial training is the key to find optimal strategies against adversarial example attacks. However, in a fully randomized setting where both the defender and the attacker can use randomized strategies, there are no efficient algorithm for finding such an optimal strategy. To fill the gap, we propose the first algorithm of its kind, called FRAT, which models the problem with a new infinite-dimensional continuous-time flow on probability distribution spaces. FRAT maintains a lightweight mixture of models for the defender, with flexibility to efficiently update mixing weights and model parameters at each iteration. Furthermore, FRAT utilizes lightweight sampling subroutines to construct a random strategy for the attacker. We prove that the continuous-time limit of FRAT converges to a mixed Nash equilibria in a zero-sum game formed by a defender and an attacker. Experimental results also demonstrate the efficiency of FRAT on CIFAR-10 and CIFAR-100 datasets.

CVJan 26, 2025Code
CE-SDWV: Effective and Efficient Concept Erasure for Text-to-Image Diffusion Models via a Semantic-Driven Word Vocabulary

Jiahang Tu, Qian Feng, Jiahua Dong et al.

Large-scale text-to-image (T2I) diffusion models have achieved remarkable generative performance about various concepts. With the limitation of privacy and safety in practice, the generative capability concerning NSFW (Not Safe For Work) concepts is undesirable, e.g., producing sexually explicit photos, and licensed images. The concept erasure task for T2I diffusion models has attracted considerable attention and requires an effective and efficient method. To achieve this goal, we propose a CE-SDWV framework, which removes the target concepts (e.g., NSFW concepts) of T2I diffusion models in the text semantic space by only adjusting the text condition tokens and does not need to re-train the original T2I diffusion model's weights. Specifically, our framework first builds a target concept-related word vocabulary to enhance the representation of the target concepts within the text semantic space, and then utilizes an adaptive semantic component suppression strategy to ablate the target concept-related semantic information in the text condition tokens. To further adapt the above text condition tokens to the original image semantic space, we propose an end-to-end gradient-orthogonal token optimization strategy. Extensive experiments on I2P and UnlearnCanvas benchmarks demonstrate the effectiveness and efficiency of our method. Code is available at https://github.com/TtuHamg/CE-SDWV.

LGJan 20
FG-OrIU: Towards Better Forgetting via Feature-Gradient Orthogonality for Incremental Unlearning

Qian Feng, JiaHang Tu, Mintong Kang et al.

Incremental unlearning (IU) is critical for pre-trained models to comply with sequential data deletion requests, yet existing methods primarily suppress parameters or confuse knowledge without explicit constraints on both feature and gradient level, resulting in \textit{superficial forgetting} where residual information remains recoverable. This incomplete forgetting risks security breaches and disrupts retention balance, especially in IU scenarios. We propose FG-OrIU (\textbf{F}eature-\textbf{G}radient \textbf{Or}thogonality for \textbf{I}ncremental \textbf{U}nlearning), the first framework unifying orthogonal constraints on both features and gradients level to achieve deep forgetting, where the forgetting effect is irreversible. FG-OrIU decomposes feature spaces via Singular Value Decomposition (SVD), separating forgetting and remaining class features into distinct subspaces. It then enforces dual constraints: feature orthogonal projection on both forgetting and remaining classes, while gradient orthogonal projection prevents the reintroduction of forgotten knowledge and disruption to remaining classes during updates. Additionally, dynamic subspace adaptation merges newly forgetting subspaces and contracts remaining subspaces, ensuring a stable balance between removal and retention across sequential unlearning tasks. Extensive experiments demonstrate the effectiveness of our method.

CVNov 10, 2022
ClassPruning: Speed Up Image Restoration Networks by Dynamic N:M Pruning

Yang Zhou, Yuda Song, Hui Qian et al.

Image restoration tasks have achieved tremendous performance improvements with the rapid advancement of deep neural networks. However, most prevalent deep learning models perform inference statically, ignoring that different images have varying restoration difficulties and lightly degraded images can be well restored by slimmer subnetworks. To this end, we propose a new solution pipeline dubbed ClassPruning that utilizes networks with different capabilities to process images with varying restoration difficulties. In particular, we use a lightweight classifier to identify the image restoration difficulty, and then the sparse subnetworks with different capabilities can be sampled based on predicted difficulty by performing dynamic N:M fine-grained structured pruning on base restoration networks. We further propose a novel training strategy along with two additional loss terms to stabilize training and improve performance. Experiments demonstrate that ClassPruning can help existing methods save approximately 40% FLOPs while maintaining performance.

CVMar 26, 2025Code
IAP: Improving Continual Learning of Vision-Language Models via Instance-Aware Prompting

Hao Fu, Hanbin Zhao, Jiahua Dong et al.

Recent pre-trained vision-language models (PT-VLMs) often face a Multi-Domain Task Incremental Learning (MTIL) scenario in practice, where several classes and domains of multi-modal tasks are incrementally arrived. Without access to previously seen tasks and unseen tasks, memory-constrained MTIL suffers from forward and backward forgetting. To alleviate the above challenges, parameter-efficient fine-tuning techniques (PEFT), such as prompt tuning, are employed to adapt the PT-VLM to the diverse incrementally learned tasks. To achieve effective new task adaptation, existing methods only consider the effect of PEFT strategy selection, but neglect the influence of PEFT parameter setting (e.g., prompting). In this paper, we tackle the challenge of optimizing prompt designs for diverse tasks in MTIL and propose an Instance-Aware Prompting (IAP) framework. Specifically, our Instance-Aware Gated Prompting (IA-GP) strategy enhances adaptation to new tasks while mitigating forgetting by adaptively assigning prompts across transformer layers at the instance level. Our Instance-Aware Class-Distribution-Driven Prompting (IA-CDDP) improves the task adaptation process by determining an accurate task-label-related confidence score for each instance. Experimental evaluations across 11 datasets, using three performance metrics, demonstrate the effectiveness of our proposed method. The source codes are available at https://github.com/FerdinandZJU/IAP.

LGFeb 9
Conditional Sequence Modeling for Safe Reinforcement Learning

Wensong Bai, Chao Zhang, Qihang Xu et al.

Offline safe reinforcement learning (RL) aims to learn policies from a fixed dataset while maximizing performance under cumulative cost constraints. In practice, deployment requirements often vary across scenarios, necessitating a single policy that can adapt zero-shot to different cost thresholds. However, most existing offline safe RL methods are trained under a pre-specified threshold, yielding policies with limited generalization and deployment flexibility across cost thresholds. Motivated by recent progress in conditional sequence modeling (CSM), which enables flexible goal-conditioned control by specifying target returns, we propose RCDT, a CSM-based method that supports zero-shot deployment across multiple cost thresholds within a single trained policy. RCDT is the first CSM-based offline safe RL algorithm that integrates a Lagrangian-style cost penalty with an auto-adaptive penalty coefficient. To avoid overly conservative behavior and achieve a more favorable return--cost trade-off, a reward--cost-aware trajectory reweighting mechanism and Q-value regularization are further incorporated. Extensive experiments on the DSRL benchmark demonstrate that RCDT consistently improves return--cost trade-offs over representative baselines, advancing the state-of-the-art in offline safe RL.

LGDec 27, 2023
GAD-PVI: A General Accelerated Dynamic-Weight Particle-Based Variational Inference Framework

Fangyikang Wang, Huminhao Zhu, Chao Zhang et al.

Particle-based Variational Inference (ParVI) methods approximate the target distribution by iteratively evolving finite weighted particle systems. Recent advances of ParVI methods reveal the benefits of accelerated position update strategies and dynamic weight adjustment approaches. In this paper, we propose the first ParVI framework that possesses both accelerated position update and dynamical weight adjustment simultaneously, named the General Accelerated Dynamic-Weight Particle-based Variational Inference (GAD-PVI) framework. Generally, GAD-PVI simulates the semi-Hamiltonian gradient flow on a novel Information-Fisher-Rao space, which yields an additional decrease on the local functional dissipation. GAD-PVI is compatible with different dissimilarity functionals and associated smoothing approaches under three information metrics. Experiments on both synthetic and real-world data demonstrate the faster convergence and reduced approximation error of GAD-PVI methods over the state-of-the-art.

CVMay 30, 2025
Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin

Fangyikang Wang, Hubery Yin, Lei Qian et al.

The diffusion models (DMs) have demonstrated the remarkable capability of generating images via learning the noised score function of data distribution. Current DM sampling techniques typically rely on first-order Langevin dynamics at each noise level, with efforts concentrated on refining inter-level denoising strategies. While leveraging additional second-order Hessian geometry to enhance the sampling quality of Langevin is a common practice in Markov chain Monte Carlo (MCMC), the naive attempts to utilize Hessian geometry in high-dimensional DMs lead to quadratic-complexity computational costs, rendering them non-scalable. In this work, we introduce a novel Levenberg-Marquardt-Langevin (LML) method that approximates the diffusion Hessian geometry in a training-free manner, drawing inspiration from the celebrated Levenberg-Marquardt optimization algorithm. Our approach introduces two key innovations: (1) A low-rank approximation of the diffusion Hessian, leveraging the DMs' inherent structure and circumventing explicit quadratic-complexity computations; (2) A damping mechanism to stabilize the approximated Hessian. This LML approximated Hessian geometry enables the diffusion sampling to execute more accurate steps and improve the image generation quality. We further conduct a theoretical analysis to substantiate the approximation error bound of low-rank approximation and the convergence property of the damping mechanism. Extensive experiments across multiple pretrained DMs validate that the LML method significantly improves image generation quality, with negligible computational overhead.

LGMay 29, 2025
Efficiently Access Diffusion Fisher: Within the Outer Product Span Space

Fangyikang Wang, Hubery Yin, Shaobin Zhuang et al.

Recent Diffusion models (DMs) advancements have explored incorporating the second-order diffusion Fisher information (DF), defined as the negative Hessian of log density, into various downstream tasks and theoretical analysis. However, current practices typically approximate the diffusion Fisher by applying auto-differentiation to the learned score network. This black-box method, though straightforward, lacks any accuracy guarantee and is time-consuming. In this paper, we show that the diffusion Fisher actually resides within a space spanned by the outer products of score and initial data. Based on the outer-product structure, we develop two efficient approximation algorithms to access the trace and matrix-vector multiplication of DF, respectively. These algorithms bypass the auto-differentiation operations with time-efficient vector-product calculations. Furthermore, we establish the approximation error bounds for the proposed algorithms. Experiments in likelihood evaluation and adjoint optimization demonstrate the superior accuracy and reduced computational cost of our proposed algorithms. Additionally, based on the novel outer-product formulation of DF, we design the first numerical verification experiment for the optimal transport property of the general PF-ODE deduced map.

LGMar 31, 2025
Many-to-Many Matching via Sparsity Controlled Optimal Transport

Weijie Liu, Han Bao, Makoto Yamada et al.

Many-to-many matching seeks to match multiple points in one set and multiple points in another set, which is a basis for a wide range of data mining problems. It can be naturally recast in the framework of Optimal Transport (OT). However, existing OT methods either lack the ability to accomplish many-to-many matching or necessitate careful tuning of a regularization parameter to achieve satisfactory results. This paper proposes a novel many-to-many matching method to explicitly encode many-to-many constraints while preventing the degeneration into one-to-one matching. The proposed method consists of the following two components. The first component is the matching budget constraints on each row and column of a transport plan, which specify how many points can be matched to a point at most. The second component is the deformed $q$-entropy regularization, which encourages a point to meet the matching budget maximally. While the deformed $q$-entropy was initially proposed to sparsify a transport plan, we employ it to avoid the degeneration into one-to-one matching. We optimize the objective via a penalty algorithm, which is efficient and theoretically guaranteed to converge. Experimental results on various tasks demonstrate that the proposed method achieves good performance by gleaning meaningful many-to-many matchings.

LGJan 25, 2024
Neural Sinkhorn Gradient Flow

Huminhao Zhu, Fangyikang Wang, Chao Zhang et al.

Wasserstein Gradient Flows (WGF) with respect to specific functionals have been widely used in the machine learning literature. Recently, neural networks have been adopted to approximate certain intractable parts of the underlying Wasserstein gradient flow and result in efficient inference procedures. In this paper, we introduce the Neural Sinkhorn Gradient Flow (NSGF) model, which parametrizes the time-varying velocity field of the Wasserstein gradient flow w.r.t. the Sinkhorn divergence to the target distribution starting a given source distribution. We utilize the velocity field matching training scheme in NSGF, which only requires samples from the source and target distribution to compute an empirical velocity field approximation. Our theoretical analyses show that as the sample size increases to infinity, the mean-field limit of the empirical approximation converges to the true underlying velocity field. To further enhance model efficiency on high-dimensional tasks, a two-phase NSGF++ model is devised, which first follows the Sinkhorn flow to approach the image manifold quickly ($\le 5$ NFEs) and then refines the samples along a simple straight flow. Numerical experiments with synthetic and real-world benchmark datasets support our theoretical results and demonstrate the effectiveness of the proposed methods.

LGFeb 6, 2022
SIGMA: A Structural Inconsistency Reducing Graph Matching Algorithm

Weijie Liu, Chao Zhang, Nenggan Zheng et al.

Graph matching finds the correspondence of nodes across two correlated graphs and lies at the core of many applications. When graph side information is not available, the node correspondence is estimated on the sole basis of network topologies. In this paper, we propose a novel criterion to measure the graph matching accuracy, structural inconsistency (SI), which is defined based on the network topological structure. Specifically, SI incorporates the heat diffusion wavelet to accommodate the multi-hop structure of the graphs. Based on SI, we propose a Structural Inconsistency reducing Graph Matching Algorithm (SIGMA), which improves the alignment scores of node pairs that have low SI values in each iteration. Under suitable assumptions, SIGMA can reduce SI values of true counterparts. Furthermore, we demonstrate that SIGMA can be derived by using a mirror descent method to solve the Gromov-Wasserstein distance with a novel K-hop-structure-based matching costs. Extensive experiments show that our method outperforms state-of-the-art methods.

LGDec 2, 2021
DPVI: A Dynamic-Weight Particle-Based Variational Inference Framework

Chao Zhang, Zhijian Li, Hui Qian et al.

The recently developed Particle-based Variational Inference (ParVI) methods drive the empirical distribution of a set of \emph{fixed-weight} particles towards a given target distribution $π$ by iteratively updating particles' positions. However, the fixed weight restriction greatly confines the empirical distribution's approximation ability, especially when the particle number is limited. In this paper, we propose to dynamically adjust particles' weights according to a Fisher-Rao reaction flow. We develop a general Dynamic-weight Particle-based Variational Inference (DPVI) framework according to a novel continuous composite flow, which evolves the positions and weights of particles simultaneously. We show that the mean-field limit of our composite flow is actually a Wasserstein-Fisher-Rao gradient flow of certain dissimilarity functional $\mathcal{F}$, which leads to a faster decrease of $\mathcal{F}$ than the Wasserstein gradient flow underlying existing fixed-weight ParVIs. By using different finite-particle approximations in our general framework, we derive several efficient DPVI algorithms. The empirical results demonstrate the superiority of our derived DPVI algorithms over their fixed-weight counterparts.

LGNov 12, 2021
Approximating Optimal Transport via Low-rank and Sparse Factorization

Weijie Liu, Chao Zhang, Nenggan Zheng et al.

Optimal transport (OT) naturally arises in a wide range of machine learning applications but may often become the computational bottleneck. Recently, one line of works propose to solve OT approximately by searching the \emph{transport plan} in a low-rank subspace. However, the optimal transport plan is often not low-rank, which tends to yield large approximation errors. For example, when Monge's \emph{transport map} exists, the transport plan is full rank. This paper concerns the computation of the OT distance with adequate accuracy and efficiency. A novel approximation for OT is proposed, in which the transport plan can be decomposed into the sum of a low-rank matrix and a sparse one. We theoretically analyze the approximation error. An augmented Lagrangian method is then designed to efficiently calculate the transport plan.

CVJul 27, 2021
StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement

Yuda Song, Hui Qian, Xin Du

Image enhancement is a subjective process whose targets vary with user preferences. In this paper, we propose a deep learning-based image enhancement method covering multiple tonal styles using only a single model dubbed StarEnhancer. It can transform an image from one tonal style to another, even if that style is unseen. With a simple one-time setting, users can customize the model to make the enhanced images more in line with their aesthetics. To make the method more practical, we propose a well-designed enhancer that can process a 4K-resolution image over 200 FPS but surpasses the contemporaneous single style image enhancement methods in terms of PSNR, SSIM, and LPIPS. Finally, our proposed enhancement method has good interactability, which allows the user to fine-tune the enhanced image using intuitive options.

LGMay 29, 2021
CDMA: A Practical Cross-Device Federated Learning Algorithm for General Minimax Problems

Jiahao Xie, Chao Zhang, Zebang Shen et al.

Minimax problems arise in a wide range of important applications including robust adversarial learning and Generative Adversarial Network (GAN) training. Recently, algorithms for minimax problems in the Federated Learning (FL) paradigm have received considerable interest. Existing federated algorithms for general minimax problems require the full aggregation (i.e., aggregation of local model information from all clients) in each training round. Thus, they are inapplicable to an important setting of FL known as the cross-device setting, which involves numerous unreliable mobile/IoT devices. In this paper, we develop the first practical algorithm named CDMA for general minimax problems in the cross-device FL setting. CDMA is based on a Start-Immediately-With-Enough-Responses mechanism, in which the server first signals a subset of clients to perform local computation and then starts to aggregate the local results reported by clients once it receives responses from enough clients in each round. With this mechanism, CDMA is resilient to the low client availability. In addition, CDMA is incorporated with a lightweight global correction in the local update steps of clients, which mitigates the impact of slow network connections. We establish theoretical guarantees of CDMA under different choices of hyperparameters and conduct experiments on AUC maximization, robust adversarial network training, and GAN training tasks. Theoretical and experimental results demonstrate the efficiency of CDMA.

LGMay 8, 2021
Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward Model

Lingwei Peng, Hui Qian, Zhebang Shen et al.

Model-free deep reinforcement learning has achieved great success in many domains, such as video games, recommendation systems and robotic control tasks. In continuous control tasks, widely used policies with Gaussian distributions results in ineffective exploration of environments and limited performance of algorithms in many cases. In this paper, we propose a density-free off-policy algorithm, Generative Actor-Critic(GAC), using the push-forward model to increase the expressiveness of policies, which also includes an entropy-like technique, MMD-entropy regularizer, to balance the exploration and exploitation. Additionnally, we devise an adaptive mechanism to automatically scale this regularizer, which further improves the stability and robustness of GAC. The experiment results show that push-forward policies possess desirable features, such as multi-modality, which can improve the efficiency of exploration and asymptotic performance of algorithms obviously.

LGDec 2, 2020
From One to All: Learning to Match Heterogeneous and Partially Overlapped Graphs

Weijie Liu, Hui Qian, Chao Zhang et al.

Recent years have witnessed a flurry of research activity in graph matching, which aims at finding the correspondence of nodes across two graphs and lies at the heart of many artificial intelligence applications. However, matching heterogeneous graphs with partial overlap remains a challenging problem in real-world applications. This paper proposes the first practical learning-to-match method to meet this challenge. The proposed unsupervised method adopts a novel partial OT paradigm to learn a transport plan and node embeddings simultaneously. In a from-one-to-all manner, the entire learning procedure is decomposed into a series of easy-to-solve sub-procedures, each of which only handles the alignment of a single type of nodes. A mechanism for searching the transport mass is also proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art graph matching methods.

LGOct 21, 2019
Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

Jiahao Xie, Zebang Shen, Chao Zhang et al.

This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-iteration computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By employing a recursive gradient estimator, our methods achieve optimal regret bounds (up to a logarithmic factor) while possessing low per-iteration computational costs. Experimental results demonstrate the efficiency of the proposed methods compared to state-of-the-arts.

LGOct 21, 2019
Aggregated Gradient Langevin Dynamics

Chao Zhang, Jiahao Xie, Zebang Shen et al.

In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling. We investigate the nonasymptotic convergence of AGLD with a unified analysis for different data accessing (e.g. random access, cyclic access and random reshuffle) and snapshot updating strategies, under convex and nonconvex settings respectively. It is the first time that bounds for I/O friendly strategies such as cyclic access and random reshuffle have been established in the MCMC literature. The theoretic results also indicate that methods in AGLD possess the merits of both the low per-iteration computational complexity and the short mixture time. Empirical studies demonstrate that our framework allows to derive novel schemes to generate high-quality samples for large-scale Bayesian posterior learning tasks.

MLMay 25, 2018
Towards More Efficient Stochastic Decentralized Learning: Faster Convergence and Sparse Communication

Zebang Shen, Aryan Mokhtari, Tengfei Zhou et al.

Recently, the decentralized optimization problem is attracting growing attention. Most existing methods are deterministic with high per-iteration cost and have a convergence rate quadratically depending on the problem condition number. Besides, the dense communication is necessary to ensure the convergence even if the dataset is sparse. In this paper, we generalize the decentralized optimization problem to a monotone operator root finding problem, and propose a stochastic algorithm named DSBA that (i) converges geometrically with a rate linearly depending on the problem condition number, and (ii) can be implemented using sparse communication only. Additionally, DSBA handles learning problems like AUC-maximization which cannot be tackled efficiently in the decentralized setting. Experiments on convex minimization and AUC-maximization validate the efficiency of our method.

MLNov 13, 2016
Accelerated Variance Reduced Block Coordinate Descent

Zebang Shen, Hui Qian, Chao Zhang et al.

Algorithms with fast convergence, small number of data access, and low per-iteration complexity are particularly favorable in the big data era, due to the demand for obtaining \emph{highly accurate solutions} to problems with \emph{a large number of samples} in \emph{ultra-high} dimensional space. Existing algorithms lack at least one of these qualities, and thus are inefficient in handling such big data challenge. In this paper, we propose a method enjoying all these merits with an accelerated convergence rate $O(\frac{1}{k^2})$. Empirical studies on large scale datasets with more than one million features are conducted to show the effectiveness of our methods in practice.

MLNov 12, 2016
Riemannian Tensor Completion with Side Information

Tengfei Zhou, Hui Qian, Zebang Shen et al.

By restricting the iterate on a nonlinear manifold, the recently proposed Riemannian optimization methods prove to be both efficient and effective in low rank tensor completion problems. However, existing methods fail to exploit the easily accessible side information, due to their format mismatch. Consequently, there is still room for improvement in such methods. To fill the gap, in this paper, a novel Riemannian model is proposed to organically integrate the original model and the side information by overcoming their inconsistency. For this particular model, an efficient Riemannian conjugate gradient descent solver is devised based on a new metric that captures the curvature of the objective.Numerical experiments suggest that our solver is more accurate than the state-of-the-art without compromising the efficiency.

CVSep 5, 2015
Co-interest Person Detection from Multiple Wearable Camera Videos

Yuewei Lin, Kareem Ezzeldeen, Youjie Zhou et al.

Wearable cameras, such as Google Glass and Go Pro, enable video data collection over larger areas and from different views. In this paper, we tackle a new problem of locating the co-interest person (CIP), i.e., the one who draws attention from most camera wearers, from temporally synchronized videos taken by multiple wearable cameras. Our basic idea is to exploit the motion patterns of people and use them to correlate the persons across different videos, instead of performing appearance-based matching as in traditional video co-segmentation/localization. This way, we can identify CIP even if a group of people with similar appearance are present in the view. More specifically, we detect a set of persons on each frame as the candidates of the CIP and then build a Conditional Random Field (CRF) model to select the one with consistent motion patterns in different videos and high spacial-temporal consistency in each video. We collect three sets of wearable-camera videos for testing the proposed algorithm. All the involved people have similar appearances in the collected videos and the experiments demonstrate the effectiveness of the proposed algorithm.

CVJul 11, 2015
LooseCut: Interactive Image Segmentation with Loosely Bounded Boxes

Hongkai Yu, Youjie Zhou, Hui Qian et al.

One popular approach to interactively segment the foreground object of interest from an image is to annotate a bounding box that covers the foreground object. Then, a binary labeling is performed to achieve a refined segmentation. One major issue of the existing algorithms for such interactive image segmentation is their preference of an input bounding box that tightly encloses the foreground object. This increases the annotation burden, and prevents these algorithms from utilizing automatically detected bounding boxes. In this paper, we develop a new LooseCut algorithm that can handle cases where the input bounding box only loosely covers the foreground object. We propose a new Markov Random Fields (MRF) model for segmentation with loosely bounded boxes, including a global similarity constraint to better distinguish the foreground and background, and an additional energy term to encourage consistent labeling of similar-appearance pixels. This MRF model is then solved by an iterated max-flow algorithm. In the experiments, we evaluate LooseCut in three publicly-available image datasets, and compare its performance against several state-of-the-art interactive image segmentation algorithms. We also show that LooseCut can be used for enhancing the performance of unsupervised video segmentation and image saliency detection.

ITMar 7, 2015
A Nonconvex Approach for Structured Sparse Learning

Shubao Zhang, Hui Qian, Zhihua Zhang

Sparse learning is an important topic in many areas such as machine learning, statistical estimation, signal processing, etc. Recently, there emerges a growing interest on structured sparse learning. In this paper we focus on the $\ell_q$-analysis optimization problem for structured sparse learning ($0< q \leq 1$). Compared to previous work, we establish weaker conditions for exact recovery in noiseless case and a tighter non-asymptotic upper bound of estimate error in noisy case. We further prove that the nonconvex $\ell_q$-analysis optimization can do recovery with a lower sample complexity and in a wider range of cosparsity than its convex counterpart. In addition, we develop an iteratively reweighted method to solve the optimization problem under the variational framework. Theoretical analysis shows that our method is capable of pursuing a local minima close to the global minima. Also, empirical results of preliminary computational experiments illustrate that our nonconvex method outperforms both its convex counterpart and other state-of-the-art methods.