CEJun 2
HonestAffinity: Leak-Aware Evaluation of Protein and Pocket Priors for Binding Affinity PredictionJunhao Wei, Baili Lu, Zhenhong Peng et al.
Sequence-based deep learning offers a scalable alternative to structure-based scoring for protein-ligand binding affinity prediction. However, progress is hard to interpret when architectural priors are evaluated on canonical PDBbind-style splits that leak similarity classes across folds. We present HonestAffinity, a compact 1D-input predictor to isolate two priors under a leak-aware protocol: frozen ESM-2 (650M) protein embeddings and a learned binary pocket-position marker. We evaluate a multi-scale convolutional/Transformer template in three variants: HonestAffinity-Pocket, HonestAffinity-NoPocket, and HonestAffinity-Pocket-NoESM. All three train on 11,513 LP-PDBBind complexes in ~3 GPU-hours. We benchmark against five baselines on the LP-PDBBind 3-tier no-leak hold-out, CASF-2016, and a CASF-2016 non-train subset. Our central finding is a split-conditioned reversal rather than a uniformly best prior: HonestAffinity-Pocket achieves the best mean Pearson R on validation and CASF-2016 splits, whereas HonestAffinity-Pocket-NoESM achieves the best mean Pearson R on every strict LP no-leak tier (test_cl1-cl3). Both the pocket marker and ESM-2 input improve performance on familiar splits but reduce Pearson R on strict no-leak tiers. We argue models should report paired canonical and leak-proof ablations, and that deployment-regime-matched variants better describe these reversals than a single default. Code and scripts are linked in the footnote; checkpoints will be released upon acceptance.
NEMay 13
WASHH: An Anchor-Aware Whale-Guided Selection Hyper-Heuristic for Continuous Optimization and SVC ConfigurationYifu Zhao, Xiaofan Zou, Junhao Wei et al.
Learning-assisted algorithm design often has to make reliable search decisions under small evaluation budgets, where committing to a single metaheuristic can be unreliable. We propose WASHH, a Whale-guided Adaptive Selection Hyper-Heuristic for continuous black-box optimization. WASHH uses WOA as the main exploitation backbone, but treats PSO-style memory, GWO-style leader averaging, DE-style variation, local coordinate search, and anchor-guided refinement as selectable search behaviors. An online reward controller allocates evaluations according to observed improvements, while anchor refinement exploits inexpensive reference configurations such as box centers or default model settings without bypassing black-box evaluation. On ten 30-dimensional benchmark functions with 10 independent runs and 12,000 evaluations, WASHH achieves the best average rank, 1.10, and is best or tied best on all ten functions. It strictly improves over WOA on eight functions and ties WOA at the numerical optimum on Rastrigin and Griewank. We further study SVC hyperparameter configuration for breast cancer diagnosis under a 300-evaluation budget. WASHH obtains the lowest mean validation log loss among the compared optimizers, suggesting that anchor-aware selection hyper-heuristics are a practical lightweight direction for LEAD systems.
CEMar 18Code
CICDWOA: A Collective Cognitive Sharing Whale Optimization Algorithm with Cauchy Inverse Cumulative Distribution for 2D/3D Path Planning and Engineering Design ProblemsJunhao Wei, Yanxiao Li, Seyedali Mirjalili et al.
The Whale Optimization Algorithm (WOA) has shown strong optimization ability but still suffers from premature convergence and weak search diversity. To address these issues, this paper proposes an enhanced WOA variant called CICDWOA. The proposed algorithm introduces a Good Nodes Set (GNS) method for uniform population initialization, a Collective Cognitive Sharing (CCS) mechanism to enhance group collaboration, and an Enhanced Spiral Updating strategy based on the Cauchy Inverse Cumulative Distribution (CICD) to strengthen global exploration and local exploitation balance. In addition, a nonlinear convergence factor and a Hybrid Gaussian-Cauchy mutation based on Differential Evolution (DE) further improve convergence efficiency and population diversity. CICDWOA was evaluated on 23 benchmark functions, 2D robot path planning problems, 3D UAV path planning tasks and 10 engineering design problems. Statistical experiment results show that CICDWOA achieves faster convergence, higher accuracy, and better robustness than classical WOA and other advanced metaheuristic algorithms. CICDWOA gained average Friedman value of 1.6790, ranking first among the SOTA algorithms. And the results of engineering simulations confirm that CICDWOA provides an effective and general framework for solving complex optimization and engineering problems. The code of CICDWOA are available on \href{URL}{https://github.com/JunhaoWei-mpu/ROBIS-Lab/tree/CICDWOA}.
LGMay 27
Benchmarking Inductive Biases for Multivariate Time-Series Anomaly Detection with a Robust Multi-View Channel-Graph DetectorJunhao Wei, Yanxiao Li, Bidong Chen et al.
We present a unified experiment, analysis, and benchmark study of multivariate time-series (MTS) anomaly detection. Ten family-representative detectors -- spanning statistical, reconstruction, association, frequency, and generic-transformer families -- are evaluated on five datasets (SMD, MSL, SMAP, PSM, and MSDS) under effectiveness, efficiency, robustness, and cross-dataset generalisation. All methods share the same windowing, scoring, hardware, and metric protocols. Effectiveness, ablation, and robustness use three random seeds; cross-dataset transfer uses seed~0 because each extra seed requires $250$ source-target evaluations. The benchmark yields three method-independent findings: no single-bias baseline dominates; absolute perturbation VUS-ROC is more informative than retention ratios; and MSDS behaves as an event-dense deployment workload rather than a sparse point-anomaly benchmark. Under this protocol we also introduce \ours{}, an adaptive detector family combining a NOTEARS-constrained directed channel-graph view with optional patch-attention and temporal-association views. \ours{} achieves the best macro-average VUS-ROC ($0.675$, $+5.1$~pt over the second-best LSTM-AE), ranks first overall, and reaches the top-3 on all five datasets. Its wins on MSL and MSDS are narrow, while its average and robustness gains are larger: under the same three-seed robustness protocol for every method, it obtains the strongest absolute VUS-ROC across noise, channel dropout, and time-shift perturbations. We release the MSDS preprocessing protocol, configurations, scripts, and seed-level metric dumps.
CEMay 25
AeroTSBoost: Temporal-Statistical Boosting for Real-World UAV Telemetry Anomaly MiningJunhao Wei, Haochen Li, Yanxiao Li et al.
Mining anomalies from unmanned aerial vehicle (UAV) state-estimation logs is challenging because failures are sparse, temporally structured, and distributed across heterogeneous PX4 telemetry streams with variable sensor availability and missing values. We present AeroTSBoost, a temporal-statistical boosting framework for real-world UAV telemetry anomaly mining. AeroTSBoost aligns multivariate flight logs, converts each window into deterministic descriptors that capture distributional shifts, quantile structure, endpoint drift, local dynamics, and lag correlation, and trains a class-balanced LightGBM detector. On UAV-SEAD, AeroTSBoost achieves the strongest AUPRC among evaluated classical, supervised tabular, neural reconstruction, recurrent, Granger-causality-based, and frequency-domain baselines. Across five seeds, it reaches $0.7516\pm0.0043$ AUPRC and $0.5342\pm0.0108$ threshold-swept event F1, improving AUPRC by 5.79 absolute points over the strongest non-AeroTSBoost baseline. Under purged chronological and leave-log-out protocols, it remains the best AUPRC method, reaching $0.6066\pm0.0193$ and $0.6388\pm0.0315$, respectively. On related ALFA fixed-wing UAV fault logs, AeroTSBoost reaches $0.9259\pm0.0076$ leave-sequence-out AUPRC, ahead of RandomForest ($0.8835\pm0.0797$) and moments-only ($0.8700\pm0.0481$). These results show that deterministic temporal-statistical representations remain highly competitive for sparse anomaly mining in operational cyber-physical telemetry.
CVOct 12, 2022
Image Projective Transformation Rectification with Synthetic Data for Smartphone-captured Chest X-ray Photos ClassificationChak Fong Chong, Yapeng Wang, Benjamin Ng et al.
Classification on smartphone-captured chest X-ray (CXR) photos to detect pathologies is challenging due to the projective transformation caused by the non-ideal camera position. Recently, various rectification methods have been proposed for different photo rectification tasks such as document photos, license plate photos, etc. Unfortunately, we found that none of them is suitable for CXR photos, due to their specific transformation type, image appearance, annotation type, etc. In this paper, we propose an innovative deep learning-based Projective Transformation Rectification Network (PTRN) to automatically rectify CXR photos by predicting the projective transformation matrix. To the best of our knowledge, it is the first work to predict the projective transformation matrix as the learning goal for photo rectification. Additionally, to avoid the expensive collection of natural data, synthetic CXR photos are generated under the consideration of natural perturbations, extra screens, etc. We evaluate the proposed approach in the CheXphoto smartphone-captured CXR photos classification competition hosted by the Stanford University Machine Learning Group, our approach won first place with a huge performance improvement (ours 0.850, second-best 0.762, in AUC). A deeper study demonstrates that the use of PTRN successfully achieves the classification performance on the spatially transformed CXR photos to the same level as on the high-quality digital CXR images, indicating PTRN can eliminate all negative impacts of projective transformation on the CXR photos.
ROMay 19
KIO-planner: Attention-Guided Single-Stage Motion Planning with Dual Mapping for UAV NavigationDexing Yao, Haochen Li, Junhao Wei et al.
Autonomous UAV flight in confined, wall-dense environments requires low-latency and reliable motion planning under strict safety constraints. Traditional optimization-based planners suffer from mapping latency and easily fall into local minima when navigating through dense structural obstacles. Meanwhile, existing end-to-end learning methods struggle to extract fine-grained geometric features from raw depth images and lack hard kinodynamic constraints, leading to unpredictable collisions near walls. To address these issues, we propose KIO-planner, an attention-guided single-stage trajectory planning framework. First, we integrate a Convolutional Block Attention Module (CBAM) into the perception backbone to adaptively focus on critical structural edges and traversable space. Second, we introduce a novel Dual Mapping mechanism--comprising physical bounds activation and a deterministic Geometric Safety Shield in the depth-pixel space--to enforce kinodynamic feasibility and collision-free flight without global map fusion. Extensive high-fidelity simulated experiments demonstrate that KIO-planner enables highly agile navigation at speeds up to 3.0 m/s. Compared to the state-of-the-art baseline, KIO-planner achieves lower inference latency (approximately 24 ms) and generates significantly smoother trajectories, reducing control cost by 28.4%. Most notably, our Dual Mapping substantially increases the worst-case safety margin, measured by minimum distance to obstacles, from 0.48 m to 0.76 m, ensuring fast, smooth, and safer navigation in highly constrained environments.
CEMay 14
Landscape-Aware Bandit Hyper-Heuristics for Online Operator Selection in UAV Inspection RoutingJunhao Wei, Yanxiao Li, Yifu Zhao et al.
UAV multi-site inspection often reduces to choosing a high-quality visiting order after target sites have been extracted from a map. This paper develops LA-BHH, a landscape-aware bandit hyper-heuristic that learns an operator-selection policy online for this routing layer. LA-BHH treats 2-opt, swap, relocate, and Or-opt moves as low-level arms, builds context from static landscape descriptors and online search-state features, and updates a LinUCB controller from improvement rewards during the same run. Experimental results on 45 generated Euclidean TSP instances show that LA-BHH achieves the best mean final gap and convergence AUC, with 0.0223 and 0.0389 respectively. It reduces final gap by 17.6\% over UCB-HH, 22.6\% over Random-HH, and 68.2\% over nearest-neighbor construction. Ablation results further show that contextual credit assignment, 2-opt repair, and stagnation-aware state use are the main contributors.
CLDec 22, 2023Code
Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-TuningRongsheng Wang, Haoming Chen, Ruizhe Zhou et al.
Existing research has demonstrated that refining large language models (LLMs) through the utilization of machine-generated instruction-following data empowers these models to exhibit impressive zero-shot capabilities for novel tasks, without requiring human-authored instructions. In this paper, we systematically investigate, preprocess, and integrate three Chinese instruction-following datasets with the aim of enhancing the Chinese conversational capabilities of Mixtral-8x7B sparse Mixture-of-Experts model. Through instruction fine-tuning on this carefully processed dataset, we successfully construct the Mixtral-8x7B sparse Mixture-of-Experts model named "Aurora." To assess the performance of Aurora, we utilize three widely recognized benchmark tests: C-Eval, MMLU, and CMMLU. Empirical studies validate the effectiveness of instruction fine-tuning applied to Mixtral-8x7B sparse Mixture-of-Experts model. This work is pioneering in the execution of instruction fine-tuning on a sparse expert-mixed model, marking a significant breakthrough in enhancing the capabilities of this model architecture. Our code, data and model are publicly available at https://github.com/WangRongsheng/Aurora
CVJan 30, 2024Code
Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial LabelsChak Fong Chong, Xinyi Fang, Jielong Guo et al.
Large-scale image datasets are often partially labeled, where only a few categories' labels are known for each image. Assigning pseudo-labels to unknown labels to gain additional training signals has become prevalent for training deep classification models. However, some pseudo-labels are inevitably incorrect, leading to a notable decline in the model classification performance. In this paper, we propose a novel method called Category-wise Fine-Tuning (CFT), aiming to reduce model inaccuracies caused by the wrong pseudo-labels. In particular, CFT employs known labels without pseudo-labels to fine-tune the logistic regressions of trained models individually to calibrate each category's model predictions. Genetic Algorithm, seldom used for training deep models, is also utilized in CFT to maximize the classification performance directly. CFT is applied to well-trained models, unlike most existing methods that train models from scratch. Hence, CFT is general and compatible with models trained with different methods and schemes, as demonstrated through extensive experiments. CFT requires only a few seconds for each category for calibration with consumer-grade GPUs. We achieve state-of-the-art results on three benchmarking datasets, including the CheXpert chest X-ray competition dataset (ensemble mAUC 93.33%, single model 91.82%), partially labeled MS-COCO (average mAP 83.69%), and Open Image V3 (mAP 85.31%), outperforming the previous bests by 0.28%, 2.21%, 2.50%, and 0.91%, respectively. The single model on CheXpert has been officially evaluated by the competition server, endorsing the correctness of the result. The outstanding results and generalizability indicate that CFT could be substantial and prevalent for classification model development. Code is available at: https://github.com/maxium0526/category-wise-fine-tuning.
AIMay 11
Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop SchedulingJunhao Wei, Yanxiao Li, Yifu Zhao et al.
Learning-assisted hyper-heuristics can select among dispatching rules while preserving the feasibility and interpretability of constructive Job Shop Scheduling Problem (JSSP) heuristics. Their main computational cost lies in label generation rather than model fitting, since each supervised label usually requires rolling out candidate rules from a partial schedule. We study this label-cost problem together with a reliability problem: a learned selector should not switch away from a strong default rule unless the predicted gain is credible. The proposed selector uses regret-normalized rollout labels, a contextual KNN uncertainty estimate, and a gate that acts only when the predicted improvement exceeds an uncertainty-adjusted margin. We also vary rollout depth and breadth to measure the cost-quality trade-off. On synthetic JSSP instances, the gated selector achieves the lowest mean RPD among learned selectors, remains close to the best fixed dispatching rule, and reduces Random-HH mean RPD by more than an order of magnitude.
ROMay 4
SAGA: A Robust Self-Attention and Goal-Aware Anchor-based Planner for Safe UAV Autonomous NavigationJunhao Wei, Yanxiao Li, Dexing Yao et al.
Agile unmanned aerial vehicle (UAV) navigation in cluttered environments demands a planning architecture that is both computationally efficient and structurally expressive enough to reason over multiple feasible motions. This paper presents SAGA, a robust self-attention and goal-aware anchor-based planner for safe UAV autonomous navigation. SAGA formulates local planning as a one-stage joint regression-and-ranking problem over a fixed lattice of motion anchors. Given a depth image and a body-frame motion state, the planner predicts refined terminal states and planning scores for all anchors in a single forward pass, after which the best candidate is decoded into a dynamically feasible trajectory. The key idea of SAGA is to transform anchor-aligned features into geometry-aware tokens and perform cross-anchor global reasoning with self-attention. To preserve directional structure in the token space, we further introduce a polar positional encoding derived from anchor yaw and pitch. In addition, a goal-aware modulation module injects velocity, acceleration, and target information into the token representation before final score prediction. Experiments in cluttered pillar-map environments under maximum speed settings of 2.0, 3.0, and 4.0~m/s show that SAGA consistently achieves a 100\% success rate, while YOPO drops from 90.91\% to 62.50\%, Ego-planner from 71.43\% to 52.63\%, and Fast-planner from 52.63\% to 38.46\%. Under the 4.0~m/s maximum speed setting, SAGA also improves average safety from 1.9843~m to 2.3888~m and minimum safety from 0.4390~m to 0.7576~m over YOPO, while reducing total flight time from 40.4631~s to 27.4901~s. The comparison with SAGA w/o PPE further shows that explicit polar positional encoding is critical for stable cross-anchor reasoning and safe passage selection in cluttered scenes.
CLJul 2, 2025
Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue SummarizationKeyan Jin, Yapeng Wang, Leonel Santos et al.
Dialogue summarization is a challenging task with significant practical value in customer service, meeting analysis, and conversational AI. Although large language models (LLMs) have achieved substantial progress in summarization tasks, the performance of step-by-step reasoning architectures-specifically Long Chain-of-Thought (CoT) implementations such as OpenAI-o1 and DeepSeek-R1-remains unexplored for dialogue scenarios requiring concurrent abstraction and conciseness. In this work, we present the first comprehensive and systematic evaluation of state-of-the-art reasoning LLMs and non-reasoning LLMs across three major paradigms-generic, role-oriented, and query-oriented dialogue summarization. Our study spans diverse languages, domains, and summary lengths, leveraging strong benchmarks (SAMSum, DialogSum, CSDS, and QMSum) and advanced evaluation protocols that include both LLM-based automatic metrics and human-inspired criteria. Contrary to trends in other reasoning-intensive tasks, our findings show that explicit stepwise reasoning does not consistently improve dialogue summarization quality. Instead, reasoning LLMs are often prone to verbosity, factual inconsistencies, and less concise summaries compared to their non-reasoning counterparts. Through scenario-specific analyses and detailed case studies, we further identify when and why explicit reasoning may fail to benefit-or even hinder-summarization in complex dialogue contexts. Our work provides new insights into the limitations of current reasoning LLMs and highlights the need for targeted modeling and evaluation strategies for real-world dialogue summarization.
CVFeb 29, 2024
Analysis of the Two-Step Heterogeneous Transfer Learning for Laryngeal Blood Vessel Classification: Issue and ImprovementXinyi Fang, Xu Yang, Chak Fong Chong et al.
Accurate classification of laryngeal vascular as benign or malignant is crucial for early detection of laryngeal cancer. However, organizations with limited access to laryngeal vascular images face challenges due to the lack of large and homogeneous public datasets for effective learning. Distinguished from the most familiar works, which directly transfer the ImageNet pre-trained models to the target domain for fine-tuning, this work pioneers exploring two-step heterogeneous transfer learning (THTL) for laryngeal lesion classification with nine deep-learning models, utilizing the diabetic retinopathy color fundus images, semantically non-identical yet vascular images, as the intermediate domain. Attention visualization technique, Layer Class Activate Map (LayerCAM), reveals a novel finding that yet the intermediate and the target domain both reflect vascular structure to a certain extent, the prevalent radial vascular pattern in the intermediate domain prevents learning the features of twisted and tangled vessels that distinguish the malignant class in the target domain, summarizes a vital rule for laryngeal lesion classification using THTL. To address this, we introduce an enhanced fine-tuning strategy in THTL called Step-Wise Fine-Tuning (SWFT) and apply it to the ResNet models. SWFT progressively refines model performance by accumulating fine-tuning layers from back to front, guided by the visualization results of LayerCAM. Comparison with the original THTL approach shows significant improvements. For ResNet18, the accuracy and malignant recall increases by 26.1% and 79.8%, respectively, while for ResNet50, these indicators improve by 20.4% and 62.2%, respectively.
CVMay 24, 2024
Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image ClassificationChak Fong Chong, Jielong Guo, Xu Yang et al.
Multi-label image classification datasets are often partially labeled where many labels are missing, posing a significant challenge to training accurate deep classifiers. However, the powerful Mixup sample-mixing data augmentation cannot be well utilized to address this challenge, as it cannot perform linear interpolation on the unknown labels to construct augmented samples. In this paper, we propose LogicMix, a Mixup variant designed for such partially labeled datasets. LogicMix mixes the sample labels by logical OR so that the unknown labels can be correctly mixed by utilizing OR's logical equivalences, including the domination and identity laws. Unlike Mixup, which mixes exactly two samples, LogicMix can mix multiple ($\geq2$) partially labeled samples, constructing visually more confused augmented samples to regularize training. LogicMix is more general and effective than other compared Mixup variants in the experiments on various partially labeled dataset scenarios. Moreover, it is plug-and-play and only requires minimal computation, hence it can be easily inserted into existing frameworks to collaborate with other methods to improve model performance with a negligible impact on training time, as demonstrated through extensive experiments. In particular, through the collaboration of LogicMix, RandAugment, Curriculum Labeling, and Category-wise Fine-Tuning, we attain state-of-the-art performance on MS-COCO, VG-200, and Pascal VOC 2007 benchmarking datasets. The remarkable generality, effectiveness, collaboration, and simplicity suggest that LogicMix promises to be a popular and vital data augmentation method.