OCSep 5, 2022
DISA: A Dual Inexact Splitting Algorithm for Distributed Convex Composite OptimizationLuyao Guo, Xinli Shi, Shaofu Yang et al.
In this paper, we propose a novel Dual Inexact Splitting Algorithm (DISA) for distributed convex composite optimization problems, where the local loss function consists of a smooth term and a possibly nonsmooth term composed with a linear mapping. DISA, for the first time, eliminates the dependence of the convergent step-size range on the Euclidean norm of the linear mapping, while inheriting the advantages of the classic Primal-Dual Proximal Splitting Algorithm (PD-PSA): simple structure and easy implementation. This indicates that DISA can be executed without prior knowledge of the norm, and tiny step-sizes can be avoided when the norm is large. Additionally, we prove sublinear and linear convergence rates of DISA under general convexity and metric subregularity, respectively. Moreover, we provide a variant of DISA with approximate proximal mapping and prove its global convergence and sublinear convergence rate. Numerical experiments corroborate our theoretical analyses and demonstrate a significant acceleration of DISA compared to existing PD-PSAs.
OCFeb 7, 2023
Decentralized Inexact Proximal Gradient Method With Network-Independent Stepsizes for Convex Composite OptimizationLuyao Guo, Xinli Shi, Jinde Cao et al.
This paper proposes a novel CTA (Combine-Then-Adapt)-based decentralized algorithm for solving convex composite optimization problems over undirected and connected networks. The local loss function in these problems contains both smooth and nonsmooth terms. The proposed algorithm uses uncoordinated network-independent constant stepsizes and only needs to approximately solve a sequence of proximal mappings, which is advantageous for solving decentralized composite optimization problems where the proximal mappings of the nonsmooth loss functions may not have analytical solutions. For the general convex case, we prove an O(1/k) convergence rate of the proposed algorithm, which can be improved to o(1/k) if the proximal mappings are solved exactly. Furthermore, with metric subregularity, we establish a linear convergence rate for the proposed algorithm. Numerical experiments demonstrate the efficiency of the algorithm.
CROct 26, 2022
LP-BFGS attack: An adversarial attack based on the Hessian with limited pixelsJiebao Zhang, Wenhua Qian, Rencan Nie et al.
Deep neural networks are vulnerable to adversarial attacks. Most $L_{0}$-norm based white-box attacks craft perturbations by the gradient of models to the input. Since the computation cost and memory limitation of calculating the Hessian matrix, the application of Hessian or approximate Hessian in white-box attacks is gradually shelved. In this work, we note that the sparsity requirement on perturbations naturally lends itself to the usage of Hessian information. We study the attack performance and computation cost of the attack method based on the Hessian with a limited number of perturbation pixels. Specifically, we propose the Limited Pixel BFGS (LP-BFGS) attack method by incorporating the perturbation pixel selection strategy and the BFGS algorithm. Pixels with top-k attribution scores calculated by the Integrated Gradient method are regarded as optimization variables of the LP-BFGS attack. Experimental results across different networks and datasets demonstrate that our approach has comparable attack ability with reasonable computation in different numbers of perturbation pixels compared with existing solutions.
LGOct 12, 2023
Achieving Linear Speedup with ProxSkip in Distributed Stochastic OptimizationLuyao Guo, Sulaiman A. Alghunaim, Kun Yuan et al.
The ProxSkip algorithm for distributed optimization is gaining increasing attention due to its effectiveness in reducing communication. However, existing analyses of ProxSkip are limited to the strongly convex setting and fail to achieve linear speedup with respect to the number of nodes. Key questions regarding its behavior in the non-convex setting and the achievability of linear speedup remain open. In this paper, we revisit ProxSkip and address both questions. We provide a comprehensive analysis for stochastic non-convex, convex, and strongly convex problems, revealing the effects of gradient noise, local updates, network connectivity, and data heterogeneity on its convergence. We prove that ProxSkip achieves linear speedup across all three settings, and can further achieve linear speedup with network-independent stepsizes in the strongly convex setting. Moreover, we show that properly increasing local updates effectively reduces communication complexity.
11.9AIMar 11
Resource-constrained Amazons chess decision framework integrating large language models and graph attentionTianhao Qian, Zhuoxuan Li, Jinde Cao et al. · gatech
Artificial intelligence has advanced significantly through the development of intelligent game-playing systems, providing rigorous testbeds for decision-making, strategic planning, and adaptive learning. However, resource-constrained environments pose critical challenges, as conventional deep learning methods heavily rely on extensive datasets and computational resources. In this paper, we propose a lightweight hybrid framework for the Game of the Amazons, which explores the paradigm of weak-to-strong generalization by integrating the structural reasoning of graph-based learning with the generative capabilities of large language models. Specifically, we leverage a Graph Attention Autoencoder to inform a multi-step Monte Carlo Tree Search, utilize a Stochastic Graph Genetic Algorithm to optimize evaluation signals, and harness GPT-4o-mini to generate synthetic training data. Unlike traditional approaches that rely on expert demonstrations, our framework learns from noisy and imperfect supervision. We demonstrate that the Graph Attention mechanism effectively functions as a structural filter, denoising the LLM's outputs. Experiments on a 10$\times$10 Amazons board show that our hybrid approach not only achieves a 15\%--56\% improvement in decision accuracy over baselines but also significantly outperforms its teacher model (GPT-4o-mini), achieving a competitive win rate of 45.0\% at N=30 nodes and a decisive 66.5\% at only N=50 nodes. These results verify the feasibility of evolving specialized, high-performance game AI from general-purpose foundation models under stringent computational constraints.
OCDec 6, 2022
BALPA: A Balanced Primal-Dual Algorithm for Nonsmooth Optimization with Application to Distributed OptimizationLuyao Guo, Jinde Cao, Xinli Shi et al.
In this paper, we propose a novel primal-dual proximal splitting algorithm (PD-PSA), named BALPA, for the composite optimization problem with equality constraints, where the loss function consists of a smooth term and a nonsmooth term composed with a linear mapping. In BALPA, the dual update is designed as a proximal point for a time-varying quadratic function, which balances the implementation of primal and dual update and retains the proximity-induced feature of classic PD-PSAs. In addition, by this balance, BALPA eliminates the inefficiency of classic PD-PSAs for composite optimization problems in which the Euclidean norm of the linear mapping or the equality constraint mapping is large. Therefore, BALPA not only inherits the advantages of simple structure and easy implementation of classic PD-PSAs but also ensures a fast convergence when these norms are large. Moreover, we propose a stochastic version of BALPA (S-BALPA) and apply the developed BALPA to distributed optimization to devise a new distributed optimization algorithm. Furthermore, a comprehensive convergence analysis for BALPA and S-BALPA is conducted, respectively. Finally, numerical experiments demonstrate the efficiency of the proposed algorithms.
OCFeb 18
Local adapt-then-combine algorithms for distributed nonsmooth optimization: Achieving provable communication accelerationLuyao Guo, Xinli Shi, Wenying Xu et al.
This paper is concerned with the distributed composite optimization problem over networks, where agents aim to minimize a sum of local smooth components and a common nonsmooth term. Leveraging the probabilistic local updates mechanism, we propose a communication-efficient Adapt-Then-Combine (ATC) framework, FlexATC, unifying numerous ATC-based distributed algorithms. Under stepsizes independent of the network topology and the number of local updates, we establish sublinear and linear convergence rates for FlexATC in convex and strongly convex settings, respectively. Remarkably, in the strong convex setting, the linear rate is decoupled from the objective functions and network topology, and FlexATC permits communication to be skipped in most iterations without any deterioration of the linear rate. In addition, the proposed unified theory demonstrates for the first time that local updates provably lead to communication acceleration for ATC-based distributed algorithms. Numerical experiments further validate the efficacy of the proposed framework and corroborate the theoretical results.
LGJul 12, 2022
Exploring Adversarial Examples and Adversarial Robustness of Convolutional Neural Networks by Mutual InformationJiebao Zhang, Wenhua Qian, Rencan Nie et al.
A counter-intuitive property of convolutional neural networks (CNNs) is their inherent susceptibility to adversarial examples, which severely hinders the application of CNNs in security-critical fields. Adversarial examples are similar to original examples but contain malicious perturbations. Adversarial training is a simple and effective defense method to improve the robustness of CNNs to adversarial examples. The mechanisms behind adversarial examples and adversarial training are worth exploring. Therefore, this work investigates similarities and differences between normally trained CNNs (NT-CNNs) and adversarially trained CNNs (AT-CNNs) in information extraction from the mutual information perspective. We show that 1) whether NT-CNNs or AT-CNNs, for original and adversarial examples, the trends towards mutual information are almost similar throughout training; 2) compared with normal training, adversarial training is more difficult and the amount of information that AT-CNNs extract from the input is less; 3) the CNNs trained with different methods have different preferences for certain types of information; NT-CNNs tend to extract texture-based information from the input, while AT-CNNs prefer to shape-based information. The reason why adversarial examples mislead CNNs may be that they contain more texture-based information about other classes. Furthermore, we also analyze the mutual information estimators used in this work and find that they outline the geometric properties of the middle layer's output.
CVNov 20, 2025Code
SwiTrack: Tri-State Switch for Cross-Modal Object TrackingBoyue Xu, Ruichao Hou, Tongwei Ren et al.
Cross-modal object tracking (CMOT) is an emerging task that maintains target consistency while the video stream switches between different modalities, with only one modality available in each frame, mostly focusing on RGB-Near Infrared (RGB-NIR) tracking. Existing methods typically connect parallel RGB and NIR branches to a shared backbone, which limits the comprehensive extraction of distinctive modality-specific features and fails to address the issue of object drift, especially in the presence of unreliable inputs. In this paper, we propose SwiTrack, a novel state-switching framework that redefines CMOT through the deployment of three specialized streams. Specifically, RGB frames are processed by the visual encoder, while NIR frames undergo refinement via a NIR gated adapter coupled with the visual encoder to progressively calibrate shared latent space features, thereby yielding more robust cross-modal representations. For invalid modalities, a consistency trajectory prediction module leverages spatio-temporal cues to estimate target movement, ensuring robust tracking and mitigating drift. Additionally, we incorporate dynamic template reconstruction to iteratively update template features and employ a similarity alignment loss to reinforce feature consistency. Experimental results on the latest benchmarks demonstrate that our tracker achieves state-of-the-art performance, boosting precision rate and success rate gains by 7.2\% and 4.3\%, respectively, while maintaining real-time tracking at 65 frames per second. Code and models are available at https://github.com/xuboyue1999/SwiTrack.git.
CVSep 23, 2025Code
HyPSAM: Hybrid Prompt-driven Segment Anything Model for RGB-Thermal Salient Object DetectionRuichao Hou, Xingyuan Li, Tongwei Ren et al.
RGB-thermal salient object detection (RGB-T SOD) aims to identify prominent objects by integrating complementary information from RGB and thermal modalities. However, learning the precise boundaries and complete objects remains challenging due to the intrinsic insufficient feature fusion and the extrinsic limitations of data scarcity. In this paper, we propose a novel hybrid prompt-driven segment anything model (HyPSAM), which leverages the zero-shot generalization capabilities of the segment anything model (SAM) for RGB-T SOD. Specifically, we first propose a dynamic fusion network (DFNet) that generates high-quality initial saliency maps as visual prompts. DFNet employs dynamic convolution and multi-branch decoding to facilitate adaptive cross-modality interaction, overcoming the limitations of fixed-parameter kernels and enhancing multi-modal feature representation. Moreover, we propose a plug-and-play refinement network (P2RNet), which serves as a general optimization strategy to guide SAM in refining saliency maps by using hybrid prompts. The text prompt ensures reliable modality input, while the mask and box prompts enable precise salient object localization. Extensive experiments on three public datasets demonstrate that our method achieves state-of-the-art performance. Notably, HyPSAM has remarkable versatility, seamlessly integrating with different RGB-T SOD methods to achieve significant performance gains, thereby highlighting the potential of prompt engineering in this field. The code and results of our method are available at: https://github.com/milotic233/HyPSAM.
CVJun 30, 2025Code
Learning Frequency and Memory-Aware Prompts for Multi-Modal Object TrackingBoyue Xu, Ruichao Hou, Tongwei Ren et al.
Prompt-learning-based multi-modal trackers have made strong progress by using lightweight visual adapters to inject auxiliary-modality cues into frozen foundation models. However, they still underutilize two essentials: modality-specific frequency structure and long-range temporal dependencies. We present Learning Frequency and Memory-Aware Prompts, a dual-adapter framework that injects lightweight prompts into a frozen RGB tracker. A frequency-guided visual adapter adaptively transfers complementary cues across modalities by jointly calibrating spatial, channel, and frequency components, narrowing the modality gap without full fine-tuning. A multilevel memory adapter with short, long, and permanent memory stores, updates, and retrieves reliable temporal context, enabling consistent propagation across frames and robust recovery from occlusion, motion blur, and illumination changes. This unified design preserves the efficiency of prompt learning while strengthening cross-modal interaction and temporal coherence. Extensive experiments on RGB-Thermal, RGB-Depth, and RGB-Event benchmarks show consistent state-of-the-art results over fully fine-tuned and adapter-based baselines, together with favorable parameter efficiency and runtime. Code and models are available at https://github.com/xuboyue1999/mmtrack.git.
CVNov 10, 2020Code
STCNet: Spatio-Temporal Cross Network for Industrial Smoke DetectionYichao Cao, Qingfei Tang, Xiaobo Lu et al.
Industrial smoke emissions present a serious threat to natural ecosystems and human health. Prior works have shown that using computer vision techniques to identify smoke is a low cost and convenient method. However, industrial smoke detection is a challenging task because industrial emission particles are often decay rapidly outside the stacks or facilities and steam is very similar to smoke. To overcome these problems, a novel Spatio-Temporal Cross Network (STCNet) is proposed to recognize industrial smoke emissions. The proposed STCNet involves a spatial pathway to extract texture features and a temporal pathway to capture smoke motion information. We assume that spatial and temporal pathway could guide each other. For example, the spatial path can easily recognize the obvious interference such as trees and buildings, and the temporal path can highlight the obscure traces of smoke movement. If the two pathways could guide each other, it will be helpful for the smoke detection performance. In addition, we design an efficient and concise spatio-temporal dual pyramid architecture to ensure better fusion of multi-scale spatiotemporal information. Finally, extensive experiments on public dataset show that our STCNet achieves clear improvements on the challenging RISE industrial smoke detection dataset against the best competitors by 6.2%. The code will be available at: https://github.com/Caoyichao/STCNet.
LGApr 17, 2025
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and MomentumYuan Zhou, Xinli Shi, Xuelong Li et al.
Decentralized Federated Learning (DFL) eliminates the reliance on the server-client architecture inherent in traditional federated learning, attracting significant research interest in recent years. Simultaneously, the objective functions in machine learning tasks are often nonconvex and frequently incorporate additional, potentially nonsmooth regularization terms to satisfy practical requirements, thereby forming nonconvex composite optimization problems. Employing DFL methods to solve such general optimization problems leads to the formulation of Decentralized Nonconvex Composite Federated Learning (DNCFL), a topic that remains largely underexplored. In this paper, we propose a novel DNCFL algorithm, termed \bf{DEPOSITUM}. Built upon proximal stochastic gradient tracking, DEPOSITUM mitigates the impact of data heterogeneity by enabling clients to approximate the global gradient. The introduction of momentums in the proximal gradient descent step, replacing tracking variables, reduces the variance introduced by stochastic gradients. Additionally, DEPOSITUM supports local updates of client variables, significantly reducing communication costs. Theoretical analysis demonstrates that DEPOSITUM achieves an expected $ε$-stationary point with an iteration complexity of $\mathcal{O}(1/ε^2)$. The proximal gradient, consensus errors, and gradient estimation errors decrease at a sublinear rate of $\mathcal{O}(1/T)$. With appropriate parameter selection, the algorithm achieves network-independent linear speedup without requiring mega-batch sampling. Finally, we apply DEPOSITUM to the training of neural networks on real-world datasets, systematically examining the influence of various hyperparameters on its performance. Comparisons with other federated composite optimization algorithms validate the effectiveness of the proposed method.
OCDec 19, 2023
A Proximal Gradient Method With Probabilistic Multi-Gossip Communications for Decentralized Composite OptimizationLuyao Guo, Luqing Wang, Xinli Shi et al.
Decentralized optimization methods with local updates have recently gained attention for their provable ability to communication acceleration. In these methods, nodes perform several iterations of local computations between the communication rounds. Nevertheless, this capability is effective only when the network is sufficiently well-connected and the loss function is smooth. In this paper, we propose a communication-efficient method MG-Skip with probabilistic local updates and multi-gossip communications for decentralized composite (smooth + nonsmooth) optimization, whose stepsize is independent of the number of local updates and the network topology. For any undirected and connected networks, MG-Skip allows for the multi-gossip communications to be skipped in most iterations in the strongly convex setting, while its computation complexity is $\mathcal{O}\left(κ\log \frac{1}ε\right)$ and communication complexity is only $\mathcal{O}\left(\sqrt{\fracκ{(1-ρ)}} \log \frac{1}ε\right)$, where $κ$ is the condition number of the loss function, $ρ$ reflects the connectivity of the network topology, and $ε$ is the target accuracy. The theoretical results indicate that MG-Skip achieves provable communication acceleration, thereby validating the advantages of local updates in the nonsmooth setting.
18.8CVMar 12
Alternating Gradient Flow Utility: A Unified Metric for Structural Pruning and Dynamic Routing in Deep NetworksTianhao Qian, Zhuoxuan Li, Jinde Cao et al.
Efficient deep learning traditionally relies on static heuristics like weight magnitude or activation awareness (e.g., Wanda, RIA). While successful in unstructured settings, we observe a critical limitation when applying these metrics to the structural pruning of deep vision networks. These contemporary metrics suffer from a magnitude bias, failing to preserve critical functional pathways. To overcome this, we propose a decoupled kinetic paradigm inspired by Alternating Gradient Flow (AGF), utilizing an absolute feature-space Taylor expansion to accurately capture the network's structural "kinetic utility". First, we uncover a topological phase transition at extreme sparsity, where AGF successfully preserves baseline functionality and exhibits topological implicit regularization, avoiding the collapse seen in models trained from scratch. Second, transitioning to architectures without strict structural priors, we reveal a phenomenon of Sparsity Bottleneck in Vision Transformers (ViTs). Through a gradient-magnitude decoupling analysis, we discover that dynamic signals suffer from signal compression in converged models, rendering them suboptimal for real-time routing. Finally, driven by these empirical constraints, we design a hybrid routing framework that decouples AGF-guided offline structural search from online execution via zero-cost physical priors. We validate our paradigm on large-scale benchmarks: under a 75% compression stress test on ImageNet-1K, AGF effectively avoids the structural collapse where traditional metrics aggressively fall below random sampling. Furthermore, when systematically deployed for dynamic inference on ImageNet-100, our hybrid approach achieves Pareto-optimal efficiency. It reduces the usage of the heavy expert by approximately 50% (achieving an estimated overall cost of 0.92$\times$) without sacrificing the full-model accuracy.
AIMay 11, 2023
A data-driven rutting depth short-time prediction model with metaheuristic optimization for asphalt pavements based on RIOHTrackZhuoxuan Li, Iakov Korovin, Xinli Shi et al.
Rutting of asphalt pavements is a crucial design criterion in various pavement design guides. A good road transportation base can provide security for the transportation of oil and gas in road transportation. This study attempts to develop a robust artificial intelligence model to estimate different asphalt pavements' rutting depth clips, temperature, and load axes as primary characteristics. The experiment data were obtained from 19 asphalt pavements with different crude oil sources on a 2.038 km long full-scale field accelerated pavement test track (RIOHTrack, Road Track Institute) in Tongzhou, Beijing. In addition, this paper also proposes to build complex networks with different pavement rutting depths through complex network methods and the Louvain algorithm for community detection. The most critical structural elements can be selected from different asphalt pavement rutting data, and similar structural elements can be found. An extreme learning machine algorithm with residual correction (RELM) is designed and optimized using an independent adaptive particle swarm algorithm. The experimental results of the proposed method are compared with several classical machine learning algorithms, with predictions of Average Root Mean Squared Error, Average Mean Absolute Error, and Average Mean Absolute Percentage Error for 19 asphalt pavements reaching 1.742, 1.363, and 1.94\% respectively. The experiments demonstrate that the RELM algorithm has an advantage over classical machine learning methods in dealing with non-linear problems in road engineering. Notably, the method ensures the adaptation of the simulated environment to different levels of abstraction through the cognitive analysis of the production environment parameters.
LGJan 12, 2021
A SOM-based Gradient-Free Deep Learning Method with Convergence AnalysisShaosheng Xu, Jinde Cao, Yichao Cao et al.
As gradient descent method in deep learning causes a series of questions, this paper proposes a novel gradient-free deep learning structure. By adding a new module into traditional Self-Organizing Map and introducing residual into the map, a Deep Valued Self-Organizing Map network is constructed. And analysis about the convergence performance of such a deep Valued Self-Organizing Map network is proved in this paper, which gives an inequality about the designed parameters with the dimension of inputs and the loss of prediction.
CVJul 13, 2020
A new approach to descriptors generation for image retrieval by analyzing activations of deep neural network layersPaweł Staszewski, Maciej Jaworski, Jinde Cao et al.
In this paper, we consider the problem of descriptors construction for the task of content-based image retrieval using deep neural networks. The idea of neural codes, based on fully connected layers activations, is extended by incorporating the information contained in convolutional layers. It is known that the total number of neurons in the convolutional part of the network is large and the majority of them have little influence on the final classification decision. Therefore, in the paper we propose a novel algorithm that allows us to extract the most significant neuron activations and utilize this information to construct effective descriptors. The descriptors consisting of values taken from both the fully connected and convolutional layers perfectly represent the whole image content. The images retrieved using these descriptors match semantically very well to the query image, and also they are similar in other secondary image characteristics, like background, textures or color distribution. These features of the proposed descriptors are verified experimentally based on the IMAGENET1M dataset using the VGG16 neural network.