LGFeb 25
Mamba Meets Scheduling: Learning to Solve Flexible Job Shop Scheduling with Efficient Sequence ModelingZhi Cao, Cong Zhang, Yaoxin Wu et al.
The Flexible Job Shop Problem (FJSP) is a well-studied combinatorial optimization problem with extensive applications for manufacturing and production scheduling. It involves assigning jobs to various machines to optimize criteria, such as minimizing total completion time. Current learning-based methods in this domain often rely on localized feature extraction models, limiting their capacity to capture overarching dependencies spanning operations and machines. This paper introduces an innovative architecture that harnesses Mamba, a state-space model with linear computational complexity, to facilitate comprehensive sequence modeling tailored for FJSP. In contrast to prevalent graph-attention-based frameworks that are computationally intensive for FJSP, we show our model is more efficient. Specifically, the proposed model possesses an encoder and a decoder. The encoder incorporates a dual Mamba block to extract operation and machine features separately. Additionally, we introduce an efficient cross-attention decoder to learn interactive embeddings of operations and machines. Our experimental results demonstrate that our method achieves faster solving speed and surpasses the performance of state-of-the-art learning-based methods for FJSP across various benchmarks.
IVFeb 7, 2024Code
Troublemaker Learning for Low-Light Image EnhancementYinghao Song, Zhiyuan Cao, Wanhong Xiang et al.
Low-light image enhancement (LLIE) restores the color and brightness of underexposed images. Supervised methods suffer from high costs in collecting low/normal-light image pairs. Unsupervised methods invest substantial effort in crafting complex loss functions. We address these two challenges through the proposed TroubleMaker Learning (TML) strategy, which employs normal-light images as inputs for training. TML is simple: we first dim the input and then increase its brightness. TML is based on two core components. First, the troublemaker model (TM) constructs pseudo low-light images from normal images to relieve the cost of pairwise data. Second, the predicting model (PM) enhances the brightness of pseudo low-light images. Additionally, we incorporate an enhancing model (EM) to further improve the visual performance of PM outputs. Moreover, in LLIE tasks, characterizing global element correlations is important because more information on the same object can be captured. CNN cannot achieve this well, and self-attention has high time complexity. Accordingly, we propose Global Dynamic Convolution (GDC) with O(n) time complexity, which essentially imitates the partial calculation process of self-attention to formulate elementwise correlations. Based on the GDC module, we build the UGDC model. Extensive quantitative and qualitative experiments demonstrate that UGDC trained with TML can achieve competitive performance against state-of-the-art approaches on public datasets. The code is available at https://github.com/Rainbowman0/TML_LLIE.
35.0CVApr 7
RHVI-FDD: A Hierarchical Decoupling Framework for Low-Light Image EnhancementJunhao Yang, Bo Yang, Hongwei Ge et al.
Low-light images often suffer from severe noise, detail loss, and color distortion, which hinder downstream multimedia analysis and retrieval tasks. The degradation in low-light images is complex: luminance and chrominance are coupled, while within the chrominance, noise and details are deeply entangled, preventing existing methods from simultaneously correcting color distortion, suppressing noise, and preserving fine details. To tackle the above challenges, we propose a novel hierarchical decoupling framework (RHVI-FDD). At the macro level, we introduce the RHVI transform, which mitigates the estimation bias caused by input noise and enables robust luminance-chrominance decoupling. At the micro level, we design a Frequency-Domain Decoupling (FDD) module with three branches for further feature separation. Using the Discrete Cosine Transform, we decompose chrominance features into low, mid, and high-frequency bands that predominantly represent global tone, local details, and noise components, which are then processed by tailored expert networks in a divide-and-conquer manner and fused via an adaptive gating module for content-aware fusion. Extensive experiments on multiple low-light datasets demonstrate that our method consistently outperforms existing state-of-the-art approaches in both objective metrics and subjective visual quality.
LGJun 19, 2025
EFormer: An Effective Edge-based Transformer for Vehicle Routing ProblemsDian Meng, Zhiguang Cao, Yaoxin Wu et al.
Recent neural heuristics for the Vehicle Routing Problem (VRP) primarily rely on node coordinates as input, which may be less effective in practical scenarios where real cost metrics-such as edge-based distances-are more relevant. To address this limitation, we introduce EFormer, an Edge-based Transformer model that uses edge as the sole input for VRPs. Our approach employs a precoder module with a mixed-score attention mechanism to convert edge information into temporary node embeddings. We also present a parallel encoding strategy characterized by a graph encoder and a node encoder, each responsible for processing graph and node embeddings in distinct feature spaces, respectively. This design yields a more comprehensive representation of the global relationships among edges. In the decoding phase, parallel context embedding and multi-query integration are used to compute separate attention mechanisms over the two encoded embeddings, facilitating efficient path construction. We train EFormer using reinforcement learning in an autoregressive manner. Extensive experiments on the Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) reveal that EFormer outperforms established baselines on synthetic datasets, including large-scale and diverse distributions. Moreover, EFormer demonstrates strong generalization on real-world instances from TSPLib and CVRPLib. These findings confirm the effectiveness of EFormer's core design in solving VRPs.
LGJun 2, 2025
Towards Efficient Few-shot Graph Neural Architecture Search via Partitioning Gradient ContributionWenhao Song, Xuan Wu, Bo Yang et al.
To address the weight coupling problem, certain studies introduced few-shot Neural Architecture Search (NAS) methods, which partition the supernet into multiple sub-supernets. However, these methods often suffer from computational inefficiency and tend to provide suboptimal partitioning schemes. To address this problem more effectively, we analyze the weight coupling problem from a novel perspective, which primarily stems from distinct modules in succeeding layers imposing conflicting gradient directions on the preceding layer modules. Based on this perspective, we propose the Gradient Contribution (GC) method that efficiently computes the cosine similarity of gradient directions among modules by decomposing the Vector-Jacobian Product during supernet backpropagation. Subsequently, the modules with conflicting gradient directions are allocated to distinct sub-supernets while similar ones are grouped together. To assess the advantages of GC and address the limitations of existing Graph Neural Architecture Search methods, which are limited to searching a single type of Graph Neural Networks (Message Passing Neural Networks (MPNNs) or Graph Transformers (GTs)), we propose the Unified Graph Neural Architecture Search (UGAS) framework, which explores optimal combinations of MPNNs and GTs. The experimental results demonstrate that GC achieves state-of-the-art (SOTA) performance in supernet partitioning quality and time efficiency. In addition, the architectures searched by UGAS+GC outperform both the manually designed GNNs and those obtained by existing NAS methods. Finally, ablation studies further demonstrate the effectiveness of all proposed methods.
CVMar 1, 2025
2DMCG:2DMambawith Change Flow Guidance for Change Detection in Remote SensingJunYao Kaung, HongWei Ge
Remote sensing change detection (CD) has made significant advancements with the adoption of Convolutional Neural Networks (CNNs) and Transformers. While CNNs offer powerful feature extraction, they are constrained by receptive field limitations, and Transformers suffer from quadratic complexity when processing long sequences, restricting scalability. The Mamba architecture provides an appealing alternative, offering linear complexity and high parallelism. However, its inherent 1D processing structure causes a loss of spatial information in 2D vision tasks. This paper addresses this limitation by proposing an efficient framework based on a Vision Mamba variant that enhances its ability to capture 2D spatial information while maintaining the linear complexity characteristic of Mamba. The framework employs a 2DMamba encoder to effectively learn global spatial contextual information from multi-temporal images. For feature fusion, we introduce a 2D scan-based, channel-parallel scanning strategy combined with a spatio-temporal feature fusion method, which adeptly captures both local and global change information, alleviating spatial discontinuity issues during fusion. In the decoding stage, we present a feature change flow-based decoding method that improves the mapping of feature change information from low-resolution to high-resolution feature maps, mitigating feature shift and misalignment. Extensive experiments on benchmark datasets such as LEVIR-CD+ and WHU-CD demonstrate the superior performance of our framework compared to state-of-the-art methods, showcasing the potential of Vision Mamba for efficient and accurate remote sensing change detection.
CVOct 15, 2019
Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive StyleHongwei Ge, Zehang Yan, Kai Zhang et al.
Image captioning is a research hotspot where encoder-decoder models combining convolutional neural network (CNN) and long short-term memory (LSTM) achieve promising results. Despite significant progress, these models generate sentences differently from human cognitive styles. Existing models often generate a complete sentence from the first word to the end, without considering the influence of the following words on the whole sentence generation. In this paper, we explore the utilization of a human-like cognitive style, i.e., building overall cognition for the image to be described and the sentence to be constructed, for enhancing computer image understanding. This paper first proposes a Mutual-aid network structure with Bidirectional LSTMs (MaBi-LSTMs) for acquiring overall contextual information. In the training process, the forward and backward LSTMs encode the succeeding and preceding words into their respective hidden states by simultaneously constructing the whole sentence in a complementary manner. In the captioning process, the LSTM implicitly utilizes the subsequent semantic information contained in its hidden states. In fact, MaBi-LSTMs can generate two sentences in forward and backward directions. To bridge the gap between cross-domain models and generate a sentence with higher quality, we further develop a cross-modal attention mechanism to retouch the two sentences by fusing their salient parts as well as the salient areas of the image. Experimental results on the Microsoft COCO dataset show that the proposed model improves the performance of encoder-decoder models and achieves state-of-the-art results.
NEApr 12, 2019
A Reference Vector based Many-Objective Evolutionary Algorithm with Feasibility-aware AdaptationMingde Zhao, Hongwei Ge, Kai Zhang et al.
The infeasible parts of the objective space in difficult many-objective optimization problems cause trouble for evolutionary algorithms. This paper proposes a reference vector based algorithm which uses two interacting engines to adapt the reference vectors and to evolve the population towards the true Pareto Front (PF) s.t. the reference vectors are always evenly distributed within the current PF to provide appropriate guidance for selection. The current PF is tracked by maintaining an archive of undominated individuals, and adaptation of reference vectors is conducted with the help of another archive that contains layers of reference vectors corresponding to different density. Experimental results show the expected characteristics and competitive performance of the proposed algorithm TEEA.
NEDec 17, 2018
Generalizable Meta-Heuristic based on Temporal Estimation of Rewards for Large Scale Blackbox OptimizationMingde Zhao, Hongwei Ge, Yi Lian et al.
The generalization abilities of heuristic optimizers may deteriorate with the increment of the search space dimensionality. To achieve generalized performance across Large Scale Blackbox Optimization (LSBO) tasks, it ispossible to ensemble several heuristics and devise a meta-heuristic to control their initiation. This paper first proposes a methodology of transforming LSBO problems into online decision processes to maximize efficiency of resource utilization. Then, using the perspective of multi-armed bandits with non-stationary reward distributions, we propose a meta-heuristic based on Temporal Estimation of Rewards (TER) to address such decision process. TER uses a window for temporal credit assignment and Boltzmann exploration to balance the exploration-exploitation tradeoff. The prior-free TER generalizes across LSBO tasks with flexibility for different types of limited computational resources (e.g. time, money, etc.) and is easy to be adapted to new tasks for its simplicity and easy interface for heuristic articulation. Tests on the benchmarks validate the problem formulation and suggest significant effectiveness: when TER is articulated with three heuristics, competitive performance is reported across different sets of benchmark problems with search dimensions up to 10000.
NEMar 3, 2018
A Many-Objective Evolutionary Algorithm With Two Interacting Processes: Cascade Clustering and Reference Point Incremental LearningHongwei Ge, Mingde Zhao, Liang Sun et al.
Researches have shown difficulties in obtaining proximity while maintaining diversity for many-objective optimization problems. Complexities of the true Pareto front pose challenges for the reference vector-based algorithms for their insufficient adaptability to the diverse characteristics with no priori. This paper proposes a many-objective optimization algorithm with two interacting processes: cascade clustering and reference point incremental learning (CLIA). In the population selection process based on cascade clustering (CC), using the reference vectors provided by the process based on incremental learning, the nondominated and the dominated individuals are clustered and sorted with different manners in a cascade style and are selected by round-robin for better proximity and diversity. In the reference vector adaptation process based on reference point incremental learning, using the feedbacks from the process based on CC, proper distribution of reference points is gradually obtained by incremental learning. Experimental studies on several benchmark problems show that CLIA is competitive compared with the state-of-the-art algorithms and has impressive efficiency and versatility using only the interactions between the two processes without incurring extra evaluations.