NAJul 26, 2018
Superiorization of Preconditioned Conjugate Gradient Algorithms for Tomographic Image ReconstructionElias S. Helou, Gabor T. Herman, Chuan Lin et al.
Properties of Superiorized Preconditioned Conjugate Gradient (SupPCG) algorithms in image reconstruction from projections are examined. Least squares (LS) is usually chosen for measuring data-inconsistency in these inverse problems. Preconditioned Conjugate Gradient algorithms are fast methods for finding an LS solution. However, for ill-posed problems, such as image reconstruction, an LS solution may not provide good image quality. This can be taken care of by superiorization. A superiorized algorithm leads to images with the value of a secondary criterion (a merit function such as the total variation) improved as compared to images with similar data-inconsistency obtained by the algorithm without superiorization. Numerical experimentation shows that SupPCG can lead to high-quality reconstructions within a remarkably short time. A theoretical analysis is also provided.
CLNov 15, 2025Code
A Reasoning Paradigm for Named Entity RecognitionHui Huang, Yanping Chen, Ruizhang Huang et al.
Generative LLMs typically improve Named Entity Recognition (NER) performance through instruction tuning. They excel at generating entities by semantic pattern matching but lack an explicit, verifiable reasoning mechanism. This "cognitive shortcutting" leads to suboptimal performance and brittle generalization, especially in zero-shot and lowresource scenarios where reasoning from limited contextual cues is crucial. To address this issue, a reasoning framework is proposed for NER, which shifts the extraction paradigm from implicit pattern matching to explicit reasoning. This framework consists of three stages: Chain of Thought (CoT) generation, CoT tuning, and reasoning enhancement. First, a dataset annotated with NER-oriented CoTs is generated, which contain task-relevant reasoning chains. Then, they are used to tune the NER model to generate coherent rationales before deriving the final answer. Finally, a reasoning enhancement stage is implemented to optimize the reasoning process using a comprehensive reward signal. This stage ensures explicit and verifiable extractions. Experiments show that ReasoningNER demonstrates impressive cognitive ability in the NER task, achieving competitive performance. In zero-shot settings, it achieves state-of-the-art (SOTA) performance, outperforming GPT-4 by 12.3 percentage points on the F1 score. Analytical results also demonstrate its great potential to advance research in reasoningoriented information extraction. Our codes are available at https://github.com/HuiResearch/ReasoningIE.
CVAug 8, 2024Code
UHNet: An Ultra-Lightweight and High-Speed Edge Detection NetworkFuzhang Li, Chuan Lin
Edge detection is crucial in medical image processing, enabling precise extraction of structural information to support lesion identification and image analysis. Traditional edge detection models typically rely on complex Convolutional Neural Networks and Vision Transformer architectures. Due to their numerous parameters and high computational demands, these models are limited in their application on resource-constrained devices. This paper presents an ultra-lightweight edge detection model (UHNet), characterized by its minimal parameter count, rapid computation speed, negligible of pre-training costs, and commendable performance. UHNet boasts impressive performance metrics with 42.3k parameters, 166 FPS, and 0.79G FLOPs. By employing an innovative feature extraction module and optimized residual connection method, UHNet significantly reduces model complexity and computational requirements. Additionally, a lightweight feature fusion strategy is explored, enhancing detection accuracy. Experimental results on the BSDS500, NYUD, and BIPED datasets validate that UHNet achieves remarkable edge detection performance while maintaining high efficiency. This work not only provides new insights into the design of lightweight edge detection models but also demonstrates the potential and application prospects of the UHNet model in engineering applications such as medical image processing. The codes are available at https://github.com/stoneLi20cv/UHNet
11.3ROMay 15Code
Task-Semantic Graph-Driven Distributed Agent Networking for Underwater Target TrackingShengchao Zhu, Guangjie Han, Chuan Lin et al.
Autonomous underwater vehicle (AUV) swarms are emerging as intelligent underwater networks, where each node must sense, communicate, process local data, and make decisions under severe acoustic constraints. Persistent underwater target tracking is a typical task with moving targets, changing communication topology, intermittent acoustic links, and limited observation for each AUV. Multi-agent reinforcement learning (MARL) is a natural candidate for distributed tracking, yet existing studies still lack a unified open-source platform for evaluating different MARL algorithms under six-degree-of-freedom AUV dynamics. In addition, policies trained with raw geometric states and low-level force actions often struggle to represent task phases, observation reliability, link quality, and local cooperation roles. This paper addresses these issues by developing an open-source MARL-AUV platform that integrates DI-engine with a six-degree-of-freedom underwater AUV target-tracking simulator. To the best of our knowledge, it is the first open platform that connects a public MARL training framework with physically modeled AUV swarm-based tasks, and provides a unified experimental protocol for fair training, testing, and comparison of representative RL and MARL algorithms. Based on this platform, we propose STG-MAPPO, a Semantic Task Graph-enhanced variant of Multi-Agent Proximal Policy Optimization. STG-MAPPO builds semantic policy inputs from tracking diagnostics, task phases, observation confidence, link availability, neighbor tracking quality, and local role advantage. A compact semantic task graph links communication-constrained network states to decentralized actor decisions, and a velocity-level action abstraction maps high-level cooperative decisions to executable six-degree-offreedom AUV control inputs.The code is available at https://github.com/dasjsaj/MARL-AUV.
CVOct 2, 2022
Unsupervised Visual Odometry and Action Integration for PointGoal Navigation in Indoor EnvironmentYijun Cao, Xianshi Zhang, Fuya Luo et al.
PointGoal navigation in indoor environment is a fundamental task for personal robots to navigate to a specified point. Recent studies solved this PointGoal navigation task with near-perfect success rate in photo-realistically simulated environments, under the assumptions with noiseless actuation and most importantly, perfect localization with GPS and compass sensors. However, accurate GPS signalis difficult to be obtained in real indoor environment. To improve the PointGoal navigation accuracy without GPS signal, we use visual odometry (VO) and propose a novel action integration module (AIM) trained in unsupervised manner. Sepecifically, unsupervised VO computes the relative pose of the agent from the re-projection error of two adjacent frames, and then replaces the accurate GPS signal with the path integration. The pseudo position estimated by VO is used to train action integration which assists agent to update their internal perception of location and helps improve the success rate of navigation. The training and inference process only use RGB, depth, collision as well as self-action information. The experiments show that the proposed system achieves satisfactory results and outperforms the partially supervised learning algorithms on the popular Gibson dataset.
64.1ROMar 28
Multi-AUV Ad-hoc Networks-Based Multi-Target Tracking Based on Scene-Adaptive Embodied IntelligenceKai Tian, Jialun Wang, Chuan Lin et al.
With the rapid advancement of underwater net-working and multi-agent coordination technologies, autonomous underwater vehicle (AUV) ad-hoc networks have emerged as a pivotal framework for executing complex maritime missions, such as multi-target tracking. However, traditional data-centricarchitectures struggle to maintain operational consistency under highly dynamic topological fluctuations and severely constrained acoustic communication bandwidth. This article proposes a scene-adaptive embodied intelligence (EI) architecture for multi-AUV ad-hoc networks, which re-envisions AUVs as embodied entities by integrating perception, decision-making, and physical execution into a unified cognitive loop. To materialize the functional interaction between these layers, we define a beacon-based communication and control model that treats the communication link as a dynamic constraint-aware channel, effectively bridging the gap between high-level policy inference and decentralized physical actuation. Specifically, the proposed architecture employs a three-layer functional framework and introduces a Scene-Adaptive MARL (SA-MARL) algorithm featuring a dual-path critic mechanism. By integrating a scene critic network and a general critic network through a weight-based dynamic fusion process, SA-MARL effectively decouples specialized tracking tasks from global safety constraints, facilitating autonomous policy evolution. Evaluation results demonstrate that the proposedscheme significantly accelerates policy convergence and achieves superior tracking accuracy compared to mainstream MARL approaches, maintaining robust performance even under intense environmental interference and fluid topological shifts.
40.2NIMar 31
Multi-AUV Cooperative Target Tracking Based on Supervised Diffusion-Aided Multi-Agent Reinforcement LearningJiaao Ma, Chuan Lin, Guangjie Han et al.
In recent years, advances in underwater networking and multi-agent reinforcement learning (MARL) have significantly expanded multi-autonomous underwater vehicle (AUV) applications in marine exploration and target tracking. However, current MARL-driven cooperative tracking faces three critical challenges: 1) non-stationarity in decentralized coordination, where local policy updates destabilize teammates' observation spaces, preventing convergence; 2) sparse-reward exploration inefficiency from limited underwater visibility and constrained sensor ranges, causing high-variance learning; and 3) water disturbance fragility combined with handcrafted reward dependency that degrades real-world robustness under unmodeled hydrodynamic conditions. To address these challenges, this paper proposes a hierarchical MARL architecture comprising four layers: global training scheduling, multi-agent coordination, local decision-making, and real-time execution. This architecture optimizes task allocation and inter-AUV coordination through hierarchical decomposition. Building on this foundation, we propose the Supervised Diffusion-Aided MARL (SDA-MARL) algorithm featuring three innovations: 1) a dual-decision architecture with segregated experience pools mitigating nonstationarity through structured experience replay; 2) a supervised learning mechanism guiding the diffusion model's reverse denoising process to generate high-fidelity training samples that accelerate convergence; and 3) disturbance-robust policy learning incorporating behavioral cloning loss to guide the Deep Deterministic Policy Gradient network update using high-quality replay actions, eliminating handcrafted reward dependency. The tracking algorithm based on SDA-MARL proposed in this paper achieves superior precision compared to state-of-the-art methods in comprehensive underwater simulations.
GRMay 16, 2024
Semantic Gesticulator: Semantics-Aware Co-Speech Gesture SynthesisZeyi Zhang, Tenglong Ao, Yuyao Zhang et al.
In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging for deep learning-based systems, trained on moderately sized datasets, to capture the relationship between the movements and the corresponding speech semantics. To address this challenge, we develop a generative retrieval framework based on a large language model. This framework efficiently retrieves suitable semantic gesture candidates from a motion library in response to the input speech. To construct this motion library, we summarize a comprehensive list of commonly used semantic gestures based on findings in linguistics, and we collect a high-quality motion dataset encompassing both body and hand movements. We also design a novel GPT-based model with strong generalization capabilities to audio, capable of generating high-quality gestures that match the rhythm of speech. Furthermore, we propose a semantic alignment mechanism to efficiently align the retrieved semantic gestures with the GPT's output, ensuring the naturalness of the final animation. Our system demonstrates robustness in generating gestures that are rhythmically coherent and semantically explicit, as evidenced by a comprehensive collection of examples. User studies confirm the quality and human-likeness of our results, and show that our system outperforms state-of-the-art systems in terms of semantic appropriateness by a clear margin.
LGSep 26, 2025
In-Context Learning can Perform Continual Learning Like HumansLiuwang Kang, Fan Wang, Shaoshan Liu et al.
Large language models (LLMs) can adapt to new tasks via in-context learning (ICL) without parameter updates, making them powerful learning engines for fast adaptation. While extensive research has examined ICL as a few-shot learner, whether it can achieve long-term retention and cross-task knowledge accumulation when multitasks arrive sequentially remains underexplored. Motivated by human memory studies, we investigate the retention characteristics of ICL in multitask settings and extend it to in-context continual learning (ICCL), where continual learning ability emerges through task scheduling and prompt rearrangement. Experiments on Markov-Chain benchmarks demonstrate that, for specific large-language models, ICCL benefits from distributed practice (DP) in a manner analogous to humans, consistently revealing a spacing "sweet spot" for retention. Beyond retention performance, we propose a human-retention similarity metric to quantify how closely a continual-learning (CL) method aligns with human retention dynamics. Using this metric, we show that linear-attention models such as MAMBA and RWKV exhibit particularly human-like retention patterns, despite their retention performance lagging behind that of Transformer-based LLMs. Overall, our results establish ICCL as both cognitively plausible and practically effective, providing an inference-only CL paradigm that mitigates catastrophic forgetting and addresses the stability-plasticity dilemma in conventional CL methods.
CVFeb 2, 2021
Learning Crisp Boundaries Using Deep Refinement Network and Adaptive Weighting LossYi-Jun Cao, Chuan Lin, Yong-Jie Li
Significant progress has been made in boundary detection with the help of convolutional neural networks. Recent boundary detection models not only focus on real object boundary detection but also "crisp" boundaries (precisely localized along the object's contour). There are two methods to evaluate crisp boundary performance. One uses more strict tolerance to measure the distance between the ground truth and the detected contour. The other focuses on evaluating the contour map without any postprocessing. In this study, we analyze both methods and conclude that both methods are two aspects of crisp contour evaluation. Accordingly, we propose a novel network named deep refinement network (DRNet) that stacks multiple refinement modules to achieve richer feature representation and a novel loss function, which combines cross-entropy and dice loss through effective adaptive fusion. Experimental results demonstrated that we achieve state-of-the-art performance for several available datasets.