CVMay 2, 2022Code
Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic SegmentationHuan Gao, Jichang Guo, Guoli Wang et al.
The performance of nighttime semantic segmentation is restricted by the poor illumination and a lack of pixel-wise annotation, which severely limit its application in autonomous driving. Existing works, e.g., using the twilight as the intermediate target domain to perform the adaptation from daytime to nighttime, may fail to cope with the inherent difference between datasets caused by the camera equipment and the urban style. Faced with these two types of domain shifts, i.e., the illumination and the inherent difference of the datasets, we propose a novel domain adaptation framework via cross-domain correlation distillation, called CCDistill. The invariance of illumination or inherent difference between two images is fully explored so as to make up for the lack of labels for nighttime images. Specifically, we extract the content and style knowledge contained in features, calculate the degree of inherent or illumination difference between two images. The domain adaptation is achieved using the invariance of the same kind of difference. Extensive experiments on Dark Zurich and ACDC demonstrate that CCDistill achieves the state-of-the-art performance for nighttime semantic segmentation. Notably, our method is a one-stage domain adaptation network which can avoid affecting the inference time. Our implementation is available at https://github.com/ghuan99/CCDistill.
SYMar 10, 2016
Analysis and Design of Phase Desynchronization in Pulse-coupled OscillatorsHuan Gao, Yongqiang Wang · meta-ai
By spreading phases on the unit circle, desynchronization algorithm is a powerful tool to achieve round-robin scheduling, which is crucial in applications as diverse as media access control of communication networks, realization of analog-to-digital converters, and scheduling of traffic flows in intersections. Driven by the increased application of pulse-coupled oscillators in achieving synchronization, desynchronization of pulse-coupled oscillators is also receiving more attention. In this paper, we propose a phase desynchronization algorithm by rigorously analyzing the dynamics of pulse-coupled oscillators and carefully designing the pulse based interaction function. A systematic proof for convergence to phase desynchronization is also given. Different from many existing results which can only achieve equal separation of firing time instants, the proposed approach can achieve equal separation of phases, which is more difficult to achieve due to phase jumps in pulse-coupled oscillators. Furthermore, the new strategy can guarantee achievement of desynchronization even when some nodes have identical initial phases, a situation which fails most existing desynchronization approaches. Numerical simulation results are provided to illustrate the effectiveness of the theoretical results.
CVNov 30, 2023Code
Is Underwater Image Enhancement All Object Detectors Need?Yudong Wang, Jichang Guo, Wanru He et al.
Underwater object detection is a crucial and challenging problem in marine engineering and aquatic robot. The difficulty is partly because of the degradation of underwater images caused by light selective absorption and scattering. Intuitively, enhancing underwater images can benefit high-level applications like underwater object detection. However, it is still unclear whether all object detectors need underwater image enhancement as pre-processing. We therefore pose the questions "Does underwater image enhancement really improve underwater object detection?" and "How does underwater image enhancement contribute to underwater object detection?". With these two questions, we conduct extensive studies. Specifically, we use 18 state-of-the-art underwater image enhancement algorithms, covering traditional, CNN-based, and GAN-based algorithms, to pre-process underwater object detection data. Then, we retrain 7 popular deep learning-based object detectors using the corresponding results enhanced by different algorithms, obtaining 126 underwater object detection models. Coupled with 7 object detection models retrained using raw underwater images, we employ these 133 models to comprehensively analyze the effect of underwater image enhancement on underwater object detection. We expect this study can provide sufficient exploration to answer the aforementioned questions and draw more attention of the community to the joint problem of underwater image enhancement and underwater object detection. The pre-trained models and results are publicly available and will be regularly updated. Project page: https://github.com/BIGWangYuDong/lqit/tree/main/configs/detection/uw_enhancement_affect_detection.
DSMar 7, 2019
On the Global Synchronization of Pulse-coupled Oscillators Interacting on Chain and Directed Tree GraphsHuan Gao, Yongqiang Wang
Driven by increased applications in biological networks and wireless sensor networks, synchronization of pulse-coupled oscillators (PCOs) has gained increased popularity. However, most existing results address the local synchronization of PCOs with initial phases constrained in a half cycle, and results on global synchronization from any initial condition are very sparse. In this paper, we address global PCO synchronization from an arbitrary phase distribution under chain or directed tree graphs. Our results differ from existing global synchronization studies on decentralized PCO networks in two key aspects: first, our work allows heterogeneous coupling functions, and we analyze the behavior of oscillators with perturbations on their natural frequencies; secondly, rather than requiring a large enough coupling strength, our results hold under any coupling strength between zero and one, which is crucial because a large coupling strength has been shown to be detrimental to the robustness of PCO synchronization to disturbances.
SYDec 5, 2018
Dynamics Based Privacy Protection for Average Consensus on Directed GraphsHuan Gao, Yongqiang Wang
Average consensus is key for distributed networks, with applications ranging from network synchronization, distributed information fusion, decentralized control, to load balancing for parallel processors. Existing average consensus algorithms require each node to exchange explicit state values with its neighbors, which results in the undesirable disclosure of sensitive state information. In this paper, we propose a novel average consensus approach for directed graphs which can protect the privacy of participating nodes' initial states without the assistance of any trusted third party or data aggregator. By leveraging the inherent robustness of consensus dynamics to embed privacy in random coupling weights between interacting nodes, our proposed approach can guarantee consensus to the exact value without any error. This is in distinct difference from differential-privacy based average consensus approaches which enable privacy through sacrificing accuracy in obtained consensus value. The proposed approach is able to preserve privacy even when multiple honest-but-curious nodes collude with each other. Furthermore, by encrypting exchanged information, the proposed approach can also provide privacy protection against inference by external eavesdroppers wiretapping communication links. Numerical simulations and hardware experiments on Raspberry Pi boards confirm that the algorithm is lightweight in computation and communication.
51.7LGApr 12
Schema-Adaptive Tabular Representation Learning with LLMs for Generalizable Multimodal Clinical ReasoningHongxi Mao, Wei Zhou, Mengting Jia et al.
Machine learning for tabular data remains constrained by poor schema generalization, a challenge rooted in the lack of semantic understanding of structured variables. This challenge is particularly acute in domains like clinical medicine, where electronic health record (EHR) schemas vary significantly. To solve this problem, we propose Schema-Adaptive Tabular Representation Learning, a novel method that leverages large language models (LLMs) to create transferable tabular embeddings. By transforming structured variables into semantic natural language statements and encoding them with a pretrained LLM, our approach enables zero-shot alignment across unseen schemas without manual feature engineering or retraining. We integrate our encoder into a multimodal framework for dementia diagnosis, combining tabular and MRI data. Experiments on NACC and ADNI datasets demonstrate state-of-the-art performance and successful zero-shot transfer to unseen schemas, significantly outperforming clinical baselines, including board-certified neurologists, in retrospective diagnostic tasks. These results validate our LLM-driven approach as a scalable, robust solution for heterogeneous real-world data, offering a pathway to extend LLM-based reasoning to structured domains.
CVMar 12, 2021Code
UIEC^2-Net: CNN-based Underwater Image Enhancement Using Two Color SpaceYudong Wang, Jichang Guo, Huan Gao et al.
Underwater image enhancement has attracted much attention due to the rise of marine resource development in recent years. Benefit from the powerful representation capabilities of Convolution Neural Networks(CNNs), multiple underwater image enhancement algorithms based on CNNs have been proposed in the last few years. However, almost all of these algorithms employ RGB color space setting, which is insensitive to image properties such as luminance and saturation. To address this problem, we proposed Underwater Image Enhancement Convolution Neural Network using 2 Color Space (UICE^2-Net) that efficiently and effectively integrate both RGB Color Space and HSV Color Space in one single CNN. To our best knowledge, this method is the first to use HSV color space for underwater image enhancement based on deep learning. UIEC^2-Net is an end-to-end trainable network, consisting of three blocks as follow: a RGB pixel-level block implements fundamental operations such as denoising and removing color cast, a HSV global-adjust block for globally adjusting underwater image luminance, color and saturation by adopting a novel neural curve layer, and an attention map block for combining the advantages of RGB and HSV block output images by distributing weight to each pixel. Experimental results on synthetic and real-world underwater images show the good performance of our proposed method in both subjective comparisons and objective metrics. The code are available at https://github.com/BIGWangYuDong/UWEnhancement.
57.1SYApr 23
Privacy-Preserving Distributed Stochastic Optimization with Homomorphic Encryption and Heterogeneous StepsizesHaoqiang Zhou, Chi Chen, Yongfeng Zhi et al.
Distributed stochastic optimization enables multi-agent collaboration in applications such as distributed learning and sensor networks, but also raises critical privacy concerns due to the involvement of sensitive data. While existing privacy-preserving approaches often face limitations in balancing accuracy with efficiency, we propose a novel distributed stochastic gradient descent algorithm that integrates Paillier homomorphic encryption with heterogeneous and time-varying random stepsizes. The proposed algorithm provides inherent privacy protection against both internal honest-but-curious agents and external eavesdroppers, without relying on any trusted neighbors. Furthermore, we incorporate an attenuation factor to effectively mitigate quantization error induced by the encryption process, ensuring almost sure convergence to the optimal solution while maintaining privacy preservation. Numerical simulations demonstrate the effectiveness and efficiency of the proposed approach.
CVDec 4, 2023
Adapting Short-Term Transformers for Action Detection in Untrimmed VideosMin Yang, Huan Gao, Ping Guo et al.
Vision Transformer (ViT) has shown high potential in video recognition, owing to its flexible design, adaptable self-attention mechanisms, and the efficacy of masked pre-training. Yet, it remains unclear how to adapt these pre-trained short-term ViTs for temporal action detection (TAD) in untrimmed videos. The existing works treat them as off-the-shelf feature extractors for each short-trimmed snippet without capturing the fine-grained relation among different snippets in a broader temporal context. To mitigate this issue, this paper focuses on designing a new mechanism for adapting these pre-trained ViT models as a unified long-form video transformer to fully unleash its modeling power in capturing inter-snippet relation, while still keeping low computation overhead and memory consumption for efficient TAD. To this end, we design effective cross-snippet propagation modules to gradually exchange short-term video information among different snippets from two levels. For inner-backbone information propagation, we introduce a cross-snippet propagation strategy to enable multi-snippet temporal feature interaction inside the backbone.For post-backbone information propagation, we propose temporal transformer layers for further clip-level modeling. With the plain ViT-B pre-trained with VideoMAE, our end-to-end temporal action detector (ViT-TAD) yields a very competitive performance to previous temporal action detectors, riching up to 69.5 average mAP on THUMOS14, 37.40 average mAP on ActivityNet-1.3 and 17.20 average mAP on FineAction.
AIJul 4, 2025
CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMsBruce Yang, Xinfeng He, Huan Gao et al.
Effective prompt design is essential for improving the planning capabilities of large language model (LLM)-driven agents. However, existing structured prompting strategies are typically limited to single-agent, plan-only settings, and often evaluate performance solely based on task accuracy - overlooking critical factors such as token efficiency, modularity, and scalability in multi-agent environments. To address these limitations, we introduce CodeAgents, a prompting framework that codifies multi-agent reasoning and enables structured, token-efficient planning in multi-agent systems. In CodeAgents, all components of agent interaction - Task, Plan, Feedback, system roles, and external tool invocations - are codified into modular pseudocode enriched with control structures (e.g., loops, conditionals), boolean logic, and typed variables. This design transforms loosely connected agent plans into cohesive, interpretable, and verifiable multi-agent reasoning programs. We evaluate the proposed framework across three diverse benchmarks - GAIA, HotpotQA, and VirtualHome - using a range of representative LLMs. Results show consistent improvements in planning performance, with absolute gains of 3-36 percentage points over natural language prompting baselines. On VirtualHome, our method achieves a new state-of-the-art success rate of 56%. In addition, our approach reduces input and output token usage by 55-87% and 41-70%, respectively, underscoring the importance of token-aware evaluation metrics in the development of scalable multi-agent LLM systems. The code and resources are available at: https://anonymous.4open.science/r/CodifyingAgent-5A86
CRJun 9, 2025
LLMs Caught in the Crossfire: Malware Requests and Jailbreak ChallengesHaoyang Li, Huan Gao, Zhiyuan Zhao et al.
The widespread adoption of Large Language Models (LLMs) has heightened concerns about their security, particularly their vulnerability to jailbreak attacks that leverage crafted prompts to generate malicious outputs. While prior research has been conducted on general security capabilities of LLMs, their specific susceptibility to jailbreak attacks in code generation remains largely unexplored. To fill this gap, we propose MalwareBench, a benchmark dataset containing 3,520 jailbreaking prompts for malicious code-generation, designed to evaluate LLM robustness against such threats. MalwareBench is based on 320 manually crafted malicious code generation requirements, covering 11 jailbreak methods and 29 code functionality categories. Experiments show that mainstream LLMs exhibit limited ability to reject malicious code-generation requirements, and the combination of multiple jailbreak methods further reduces the model's security capabilities: specifically, the average rejection rate for malicious content is 60.93%, dropping to 39.92% when combined with jailbreak attack algorithms. Our work highlights that the code security capabilities of LLMs still pose significant challenges.
CLOct 10, 2025
DSPO: Stable and Efficient Policy Optimization for Agentic Search and ReasoningChenyang Gu, Yewen Pu, Bruce Yang et al.
Enhancing LLMs with the ability to actively search external knowledge is crucial for complex and real-world tasks. Current approaches either rely on prompting to elicit the model's innate agent capabilities, or suffer from performance ceilings and collapse when applying RL to complex interactive tasks, leaving their true agentic potential untapped. To address this, we introduce \textbf{D}ynamic-filter \textbf{S}equence-level \textbf{P}olicy \textbf{O}ptimization (DSPO), an improved RL algorithm designed for robust agent training through sequence-level optimization and dynamic sample filtering. We train our model purely through RL to interleave multi-turn search and reasoning, obviating the need for supervised demonstration data. Across multiple QA benchmarks, our 7B model improves over a comparable previous work by \textbf{34.1\%}, and even outperforms the 14B model from previous work in complex multihop QA such as HotpotQA by nearly \textbf{9\% relative}, maintaining exceptional training stability.
CVSep 26, 2025
Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning BenchmarksMiao Jing, Mengting Jia, Junling Lin et al.
Recent advances in vision-language models (VLMs) have achieved remarkable performance on standard medical benchmarks, yet their true clinical reasoning ability remains unclear. Existing datasets predominantly emphasize classification accuracy, creating an evaluation illusion in which models appear proficient while still failing at high-stakes diagnostic reasoning. We introduce Neural-MedBench, a compact yet reasoning-intensive benchmark specifically designed to probe the limits of multimodal clinical reasoning in neurology. Neural-MedBench integrates multi-sequence MRI scans, structured electronic health records, and clinical notes, and encompasses three core task families: differential diagnosis, lesion recognition, and rationale generation. To ensure reliable evaluation, we develop a hybrid scoring pipeline that combines LLM-based graders, clinician validation, and semantic similarity metrics. Through systematic evaluation of state-of-the-art VLMs, including GPT-4o, Claude-4, and MedGemma, we observe a sharp performance drop compared to conventional datasets. Error analysis shows that reasoning failures, rather than perceptual errors, dominate model shortcomings. Our findings highlight the necessity of a Two-Axis Evaluation Framework: breadth-oriented large datasets for statistical generalization, and depth-oriented, compact benchmarks such as Neural-MedBench for reasoning fidelity. We release Neural-MedBench at https://neuromedbench.github.io/ as an open and extensible diagnostic testbed, which guides the expansion of future benchmarks and enables rigorous yet cost-effective assessment of clinically trustworthy AI.
OCJul 12, 2017
Secure and Privacy-Preserving ConsensusMinghao Ruan, Huan Gao, Yongqiang Wang
Consensus is fundamental for distributed systems since it underpins key functionalities of such systems ranging from distributed information fusion, decision-making, to decentralized control. In order to reach an agreement, existing consensus algorithms require each agent to exchange explicit state information with its neighbors. This leads to the disclosure of private state information, which is undesirable in cases where privacy is of concern. In this paper, we propose a novel approach that enables secure and privacy-preserving average consensus in a decentralized architecture in the absence of an aggregator or third-party. By leveraging partial homomorphic cryptography to embed secrecy in pairwise interaction dynamics, our approach can guarantee consensus to the exact value in a deterministic manner without disclosing a node's state to its neighbors. In addition to enabling resilience to passive attackers aiming to steal state information, the approach also allows easy incorporation of defending mechanisms against active attackers which try to alter the content of exchanged messages. Furthermore, in contrast to existing noise-injection based privacy-preserving mechanisms which have to reconfigure the entire network when the topology or number of nodes varies, our approach is applicable to dynamic environments with time-varying coupling topologies. This secure and privacy-preservation approach is also applicable to weighted average consensus as well as maximum/minimum consensus under a new update rule. The approach is light-weight in computation and communication. Implementation details and numerical examples are provided to demonstrate the capability of our approach.