Haodong Yang

CV
h-index16
7papers
19citations
Novelty52%
AI Score48

7 Papers

IRFeb 1
FlexStructRAG: Flexible Structure-Aware Multi-Granular Relational Retrieval for RAG

Mengzhu Chen, Haodong Yang, Jia Cai et al.

Retrieval-Augmented Generation (RAG) systems critically depend on how external knowledge is segmented, structured, and retrieved. Most existing approaches either retrieve fixed-length text chunks, which fragments discourse context, or commit to a single structured index (e.g., a knowledge graph or hypergraph), which hard-codes one relational granularity. This often yields brittle retrieval when queries require different forms of evidence, such as local binary relations, higher-order interactions, or broader document-grounded context. We propose \textbf{FlexStructRAG}, a flexible structure-aware RAG framework that supports \emph{multi-granular, query-adaptive retrieval} over heterogeneous knowledge representations. FlexStructRAG jointly constructs (i) a knowledge graph for binary relations, (ii) a knowledge hypergraph for n-ary relations, and (iii) structure-aware semantic clusters that aggregate relational evidence into document-grounded context units. To reduce semantic fragmentation induced by uniform chunking, we introduce dynamic partitioning and a truncated sliding-window extraction mechanism that incorporates bounded contextual dependencies during knowledge construction. At inference time, FlexStructRAG enables entity-, edge-, hyperedge-, and cluster-level retrieval, which can be flexibly combined to supply generation with relationally and contextually aligned evidence. Experiments on the UltraDomain benchmark across four domains show that FlexStructRAG improves semantic evaluation over strong RAG baselines. Ablation and sensitivity analysis further demonstrate the necessity of multi-granular relational retrieval and structure-aware clustering.

LGApr 10
Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference

Yueyuan Sui, Payal Mohapatra, Doğaç Eldenk et al.

Edge devices increasingly run multimodal sensing pipelines that must remain accurate despite fluctuating power budgets and unpredictable sensor dropout. Existing pruning methods fail under these conditions: they generally require fine-tuning after compression, consuming over $10\times$ the deployment energy, and they assign static importance scores that are blind to which sensors are present. We present the SentryFuse framework, which addresses both challenges jointly through two key components. First, SentryGate learns modality-conditioned importance scores during training via first-order saliency supervision and then prunes attention heads and feed-forward channels at deployment without fine-tuning. Second, SentryAttend replaces dense self-attention, a key bottleneck in contemporary multimodal architectures, with sparse grouped-query attention, yielding a net 15% reduction in GFLOPs across three different multimodal architectures. Across three applications and multimodal backbones, SentryGate achieves a 12.7% average accuracy improvement over the strongest pruning baseline, and upto to 18% under modality dropout conditions. Together, SentryFuse reduces memory by 28.2% and lowers latency by up to $1.63\times$ without further fine-tuning, establishing modality-aware zero-shot compression as a practical path to multimodal intelligence on heterogeneous edge hardware.

QUANT-PHApr 17
Asymptotically Optimal Quantum Universal Quickest Change Detection

Arick Grootveld, Haodong Yang, Nandan Sriranga et al.

This paper investigates the quickest change detection of quantum states in a universal setting: specifically, where the post-change quantum state is not known a priori. We establish the asymptotic optimality of a two-stage approach in terms of worst average delay to detection. The first stage employs block POVMs with classical outputs that preserve quantum relative entropy to arbitrary precision. The second stage leverages a recently proposed windowed-CUSUM algorithm that is known to be asymptotically optimal for quickest change detection with an unknown post-change distribution in the classical setting.

CVMar 4, 2025
$\mathbfΦ$-GAN: Physics-Inspired GAN for Generating SAR Images Under Limited Data

Xidan Zhang, Yihan Zhuang, Qian Guo et al.

Approaches for improving generative adversarial networks (GANs) training under a few samples have been explored for natural images. However, these methods have limited effectiveness for synthetic aperture radar (SAR) images, as they do not account for the unique electromagnetic scattering properties of SAR. To remedy this, we propose a physics-inspired regularization method dubbed $Φ$-GAN, which incorporates the ideal point scattering center (PSC) model of SAR with two physical consistency losses. The PSC model approximates SAR targets using physical parameters, ensuring that $Φ$-GAN generates SAR images consistent with real physical properties while preventing discriminator overfitting by focusing on PSC-based decision cues. To embed the PSC model into GANs for end-to-end training, we introduce a physics-inspired neural module capable of estimating the physical parameters of SAR targets efficiently. This module retains the interpretability of the physical model and can be trained with limited data. We propose two physical loss functions: one for the generator, guiding it to produce SAR images with physical parameters consistent with real ones, and one for the discriminator, enhancing its robustness by basing decisions on PSC attributes. We evaluate $Φ$-GAN across several conditional GAN (cGAN) models, demonstrating state-of-the-art performance in data-scarce scenarios on three SAR image datasets.

CVOct 23, 2025
Knowledge-Informed Neural Network for Complex-Valued SAR Image Recognition

Haodong Yang, Zhongling Huang, Shaojie Guo et al.

Deep learning models for complex-valued Synthetic Aperture Radar (CV-SAR) image recognition are fundamentally constrained by a representation trilemma under data-limited and domain-shift scenarios: the concurrent, yet conflicting, optimization of generalization, interpretability, and efficiency. Our work is motivated by the premise that the rich electromagnetic scattering features inherent in CV-SAR data hold the key to resolving this trilemma, yet they are insufficiently harnessed by conventional data-driven models. To this end, we introduce the Knowledge-Informed Neural Network (KINN), a lightweight framework built upon a novel "compression-aggregation-compression" architecture. The first stage performs a physics-guided compression, wherein a novel dictionary processor adaptively embeds physical priors, enabling a compact unfolding network to efficiently extract sparse, physically-grounded signatures. A subsequent aggregation module enriches these representations, followed by a final semantic compression stage that utilizes a compact classification head with self-distillation to learn maximally task-relevant and discriminative embeddings. We instantiate KINN in both CNN (0.7M) and Vision Transformer (0.95M) variants. Extensive evaluations on five SAR benchmarks confirm that KINN establishes a state-of-the-art in parameter-efficient recognition, offering exceptional generalization in data-scarce and out-of-distribution scenarios and tangible interpretability, thereby providing an effective solution to the representation trilemma and offering a new path for trustworthy AI in SAR image analysis.

CVMay 18, 2025
SRLoRA: Subspace Recomposition in Low-Rank Adaptation via Importance-Based Fusion and Reinitialization

Haodong Yang, Lei Wang, Md Zakir Hossain

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method that injects two trainable low-rank matrices (A and B) into frozen pretrained models. While efficient, LoRA constrains updates to a fixed low-rank subspace (Delta W = BA), which can limit representational capacity and hinder downstream performance. We introduce Subspace Recomposition in Low-Rank Adaptation (SRLoRA) via importance-based fusion and reinitialization, a novel approach that enhances LoRA's expressiveness without compromising its lightweight structure. SRLoRA assigns importance scores to each LoRA pair (a column of B and the corresponding row of A), and dynamically recomposes the subspace during training. Less important pairs are fused into the frozen backbone, freeing capacity to reinitialize new pairs along unused principal directions derived from the pretrained weight's singular value decomposition. This mechanism enables continual subspace refreshment and richer adaptation over time, without increasing the number of trainable parameters. We evaluate SRLoRA on both language and vision tasks, including the GLUE benchmark and various image classification datasets. SRLoRA consistently achieves faster convergence and improved accuracy over standard LoRA, demonstrating its generality, efficiency, and potential for broader PEFT applications.

CVJun 21, 2024
LU2Net: A Lightweight Network for Real-time Underwater Image Enhancement

Haodong Yang, Jisheng Xu, Zhiliang Lin et al.

Computer vision techniques have empowered underwater robots to effectively undertake a multitude of tasks, including object tracking and path planning. However, underwater optical factors like light refraction and absorption present challenges to underwater vision, which cause degradation of underwater images. A variety of underwater image enhancement methods have been proposed to improve the effectiveness of underwater vision perception. Nevertheless, for real-time vision tasks on underwater robots, it is necessary to overcome the challenges associated with algorithmic efficiency and real-time capabilities. In this paper, we introduce Lightweight Underwater Unet (LU2Net), a novel U-shape network designed specifically for real-time enhancement of underwater images. The proposed model incorporates axial depthwise convolution and the channel attention module, enabling it to significantly reduce computational demands and model parameters, thereby improving processing speed. The extensive experiments conducted on the dataset and real-world underwater robots demonstrate the exceptional performance and speed of proposed model. It is capable of providing well-enhanced underwater images at a speed 8 times faster than the current state-of-the-art underwater image enhancement method. Moreover, LU2Net is able to handle real-time underwater video enhancement.