ROApr 17, 2023
Base Placement Optimization for Coverage Mobile Manipulation TasksHuiwen Zhang, Kai Mi, Zhijun Zhang
Base placement optimization (BPO) is a fundamental capability for mobile manipulation and has been researched for decades. However, it is still very challenging for some reasons. First, compared with humans, current robots are extremely inflexible, and therefore have higher requirements on the accuracy of base placements (BPs). Second, the BP and task constraints are coupled with each other. The optimal BP depends on the task constraints, and in BP will affect task constraints in turn. More tricky is that some task constraints are flexible and non-deterministic. Third, except for fulfilling tasks, some other performance metrics such as optimal energy consumption and minimal execution time need to be considered, which makes the BPO problem even more complicated. In this paper, a Scale-like disc (SLD) representation of the workspace is used to decouple task constraints and BPs. To evaluate reachability and return optimal working pose over SLDs, a reachability map (RM) is constructed offline. In order to optimize the objectives of coverage, manipulability, and time cost simultaneously, this paper formulates the BPO as a multi-objective optimization problem (MOOP). Among them, the time optimal objective is modeled as a traveling salesman problem (TSP), which is more in line with the actual situation. The evolutionary method is used to solve the MOOP. Besides, to ensure the validity and optimality of the solution, collision detection is performed on the candidate BPs, and solutions from BPO are further fine-tuned according to the specific given task. Finally, the proposed method is used to solve a real-world toilet coverage cleaning task. Experiments show that the optimized BPs can significantly improve the coverage and efficiency of the task.
CVJul 25, 2025Code
EA-ViT: Efficient Adaptation for Elastic Vision TransformerChen Zhu, Wangbo Zhao, Huiwen Zhang et al.
Vision Transformers (ViTs) have emerged as a foundational model in computer vision, excelling in generalization and adaptation to downstream tasks. However, deploying ViTs to support diverse resource constraints typically requires retraining multiple, size-specific ViTs, which is both time-consuming and energy-intensive. To address this issue, we propose an efficient ViT adaptation framework that enables a single adaptation process to generate multiple models of varying sizes for deployment on platforms with various resource constraints. Our approach comprises two stages. In the first stage, we enhance a pre-trained ViT with a nested elastic architecture that enables structural flexibility across MLP expansion ratio, number of attention heads, embedding dimension, and network depth. To preserve pre-trained knowledge and ensure stable adaptation, we adopt a curriculum-based training strategy that progressively increases elasticity. In the second stage, we design a lightweight router to select submodels according to computational budgets and downstream task demands. Initialized with Pareto-optimal configurations derived via a customized NSGA-II algorithm, the router is then jointly optimized with the backbone. Extensive experiments on multiple benchmarks demonstrate the effectiveness and versatility of EA-ViT. The code is available at https://github.com/zcxcf/EA-ViT.
NIMar 17
Fine-Grained Network Traffic Classification with Contextual QoS ProfilingHuiwen Zhang, Feng Ye
Accurate network traffic classification is vital for managing modern applications with strict Quality of Service (QoS) demands, such as edge computing, real-time XR, and autonomous systems. While recent advances in application-level classification show high accuracy, they often miss fine-grained in-app QoS variations critical for service differentiation. This paper proposes a hierarchical graph neural network (GNN) framework that combines a three-level graph representation with an automated QoS-aware assignment algorithm. The model captures multi-scale temporal patterns via packet aggregation, time-window clustering, and session-level behavior modeling. QoS priorities are derived using five key metrics (bandwidth, jitter, packet stability, burst frequency, and burst stability), processed through logarithmic transformation and weighted ranking. Evaluations across 14 usage scenarios from YouTube, Prime Video, TikTok, and Zoom show that the proposed GNN significantly outperforms state-of-the-art methods in service-level classification. The QoS-aware assignment further refines classification to enhance user experience. This work advances QoS-aware traffic classification by enabling precise in-app usage differentiation and adaptive service prioritization in dynamic network environments.
LGFeb 13
Physics-Informed Neural Networks with Architectural Physics Embedding for Large-Scale Wave Field ReconstructionHuiwen Zhang, Feng Ye, Chu Ma
Large-scale wave field reconstruction requires precise solutions but faces challenges with computational efficiency and accuracy. The physics-based numerical methods like Finite Element Method (FEM) provide high accuracy but struggle with large-scale or high-frequency problems due to prohibitive computational costs. Pure data-driven approaches excel in speed but often lack sufficient labeled data for complex scenarios. Physics-informed neural networks (PINNs) integrate physical principles into machine learning models, offering a promising solution by bridging these gaps. However, standard PINNs embed physical principles only in loss functions, leading to slow convergence, optimization instability, and spectral bias, limiting their ability for large-scale wave field reconstruction. This work introduces architecture physics embedded (PE)-PINN, which integrates additional physical guidance directly into the neural network architecture beyond Helmholtz equations and boundary conditions in loss functions. Specifically, a new envelope transformation layer is designed to mitigate spectral bias with kernels parameterized by source properties, material interfaces, and wave physics. Experiments demonstrate that PE-PINN achieves more than 10 times speedup in convergence compared to standard PINNs and several orders of magnitude reduction in memory usage compared to FEM. This breakthrough enables high-fidelity modeling for large-scale 2D/3D electromagnetic wave reconstruction involving reflections, refractions, and diffractions in room-scale domains, readily applicable to wireless communications, sensing, room acoustics, and other fields requiring large-scale wave field analysis.
CVJan 5
Nighttime Hazy Image Enhancement via Progressively and Mutually Reinforcing Night-Haze PriorsChen Zhu, Huiwen Zhang, Mu He et al.
Enhancing the visibility of nighttime hazy images is challenging due to the complex degradation distributions. Existing methods mainly address a single type of degradation (e.g., haze or low-light) at a time, ignoring the interplay of different degradation types and resulting in limited visibility improvement. We observe that the domain knowledge shared between low-light and haze priors can be reinforced mutually for better visibility. Based on this key insight, in this paper, we propose a novel framework that enhances visibility in nighttime hazy images by reinforcing the intrinsic consistency between haze and low-light priors mutually and progressively. In particular, our model utilizes image-, patch-, and pixel-level experts that operate across visual and frequency domains to recover global scene structure, regional patterns, and fine-grained details progressively. A frequency-aware router is further introduced to adaptively guide the contribution of each expert, ensuring robust image restoration. Extensive experiments demonstrate the superior performance of our model on nighttime dehazing benchmarks both quantitatively and qualitatively. Moreover, we showcase the generalizability of our model in daytime dehazing and low-light enhancement tasks.
CVJan 5
API: Empowering Generalizable Real-World Image Dehazing via Adaptive Patch Importance LearningChen Zhu, Huiwen Zhang, Yujie Li et al.
Real-world image dehazing is a fundamental yet challenging task in low-level vision. Existing learning-based methods often suffer from significant performance degradation when applied to complex real-world hazy scenes, primarily due to limited training data and the intrinsic complexity of haze density distributions.To address these challenges, we introduce a novel Adaptive Patch Importance-aware (API) framework for generalizable real-world image dehazing. Specifically, our framework consists of an Automatic Haze Generation (AHG) module and a Density-aware Haze Removal (DHR) module. AHG provides a hybrid data augmentation strategy by generating realistic and diverse hazy images as additional high-quality training data. DHR considers hazy regions with varying haze density distributions for generalizable real-world image dehazing in an adaptive patch importance-aware manner. To alleviate the ambiguity of the dehazed image details, we further introduce a new Multi-Negative Contrastive Dehazing (MNCD) loss, which fully utilizes information from multiple negative samples across both spatial and frequency domains. Extensive experiments demonstrate that our framework achieves state-of-the-art performance across multiple real-world benchmarks, delivering strong results in both quantitative metrics and qualitative visual quality, and exhibiting robust generalization across diverse haze distributions.