Yunfeng Li

CV
h-index8
17papers
347citations
Novelty45%
AI Score45

17 Papers

CVJan 4, 2023Code
Underwater Object Tracker: UOSTrack for Marine Organism Grasping of Underwater Vehicles

Yunfeng Li, Bo Wang, Ye Li et al.

A visual single-object tracker is an indispensable component of underwater vehicles (UVs) in marine organism grasping tasks. Its accuracy and stability are imperative to guide the UVs to perform grasping behavior. Although single-object trackers show competitive performance in the challenge of underwater image degradation, there are still issues with sample imbalance and exclusion of similar objects that need to be addressed for application in marine organism grasping. This paper proposes Underwater OSTrack (UOSTrack), which consists of underwater image and open-air sequence hybrid training (UOHT), and motion-based post-processing (MBPP). The UOHT training paradigm is designed to train the sample-imbalanced underwater tracker so that the tracker is exposed to a great number of underwater domain training samples and learns the feature expressions. The MBPP paradigm is proposed to exclude similar objects. It uses the estimation box predicted with a Kalman filter and the candidate boxes in the response map to relocate the lost tracked object in the candidate area. UOSTrack achieves an average performance improvement of 4.41% and 7.98% maximum compared to state-of-the-art methods on various benchmarks, respectively. Field experiments have verified the accuracy and stability of our proposed UOSTrack for UVs in marine organism grasping tasks. More details can be found at https://github.com/LiYunfengLYF/UOSTrack.

CVSep 9, 2023Code
UnitModule: A Lightweight Joint Image Enhancement Module for Underwater Object Detection

Zhuoyan Liu, Bo Wang, Ye Li et al.

Underwater object detection faces the problem of underwater image degradation, which affects the performance of the detector. Underwater object detection methods based on noise reduction and image enhancement usually do not provide images preferred by the detector or require additional datasets. In this paper, we propose a plug-and-play \textbf{U}nderwater joi\textbf{n}t \textbf{i}mage enhancemen\textbf{t} \textbf{Module} (UnitModule) that provides the input image preferred by the detector. We design an unsupervised learning loss for the joint training of UnitModule with the detector without additional datasets to improve the interaction between UnitModule and the detector. Furthermore, a color cast predictor with the assisting color cast loss and a data augmentation called Underwater Color Random Transfer (UCRT) are designed to improve the performance of UnitModule on underwater images with different color casts. Extensive experiments are conducted on DUO for different object detection models, where UnitModule achieves the highest performance improvement of 2.6 AP for YOLOv5-S and gains the improvement of 3.3 AP on the brand-new test set (\(\text{URPC}_{test}\)). And UnitModule significantly improves the performance of all object detection models we test, especially for models with a small number of parameters. In addition, UnitModule with a small number of parameters of 31K has little effect on the inference speed of the original object detection model. Our quantitative and visual analysis also demonstrates the effectiveness of UnitModule in enhancing the input image and improving the perception ability of the detector for object features. The code is available at https://github.com/LEFTeyex/UnitModule.

CLJul 30, 2023Code
Recent Advances in Hierarchical Multi-label Text Classification: A Survey

Rundong Liu, Wenhan Liang, Weijun Luo et al.

Hierarchical multi-label text classification aims to classify the input text into multiple labels, among which the labels are structured and hierarchical. It is a vital task in many real world applications, e.g. scientific literature archiving. In this paper, we survey the recent progress of hierarchical multi-label text classification, including the open sourced data sets, the main methods, evaluation metrics, learning strategies and the current challenges. A few future research directions are also listed for community to further improve this field.

CVOct 9, 2023Code
Lightweight Full-Convolutional Siamese Tracker

Yunfeng Li, Bo Wang, Xueyi Wu et al.

Although single object trackers have achieved advanced performance, their large-scale models hinder their application on limited resources platforms. Moreover, existing lightweight trackers only achieve a balance between 2-3 points in terms of parameters, performance, Flops and FPS. To achieve the optimal balance among these points, this paper proposes a lightweight full-convolutional Siamese tracker called LightFC. LightFC employs a novel efficient cross-correlation module (ECM) and a novel efficient rep-center head (ERH) to improve the feature representation of the convolutional tracking pipeline. The ECM uses an attention-like module design, which conducts spatial and channel linear fusion of fused features and enhances the nonlinearity of the fused features. Additionally, it refers to successful factors of current lightweight trackers and introduces skip-connections and reuse of search area features. The ERH reparameterizes the feature dimensional stage in the standard center-head and introduces channel attention to optimize the bottleneck of key feature flows. Comprehensive experiments show that LightFC achieves the optimal balance between performance, parameters, Flops and FPS. The precision score of LightFC outperforms MixFormerV2-S on LaSOT and TNL2K by 3.7 % and 6.5 %, respectively, while using 5x fewer parameters and 4.6x fewer Flops. Besides, LightFC runs 2x faster than MixFormerV2-S on CPUs. In addition, a higher-performance version named LightFC-vit is proposed by replacing a more powerful backbone network. The code and raw results can be found at https://github.com/LiYunfengLYF/LightFC.

CRJun 29, 2023
Towards Blockchain-Assisted Privacy-Aware Data Sharing For Edge Intelligence: A Smart Healthcare Perspective

Youyang Qu, Lichuan Ma, Wenjie Ye et al.

The popularization of intelligent healthcare devices and big data analytics significantly boosts the development of smart healthcare networks (SHNs). To enhance the precision of diagnosis, different participants in SHNs share health data that contains sensitive information. Therefore, the data exchange process raises privacy concerns, especially when the integration of health data from multiple sources (linkage attack) results in further leakage. Linkage attack is a type of dominant attack in the privacy domain, which can leverage various data sources for private data mining. Furthermore, adversaries launch poisoning attacks to falsify the health data, which leads to misdiagnosing or even physical damage. To protect private health data, we propose a personalized differential privacy model based on the trust levels among users. The trust is evaluated by a defined community density, while the corresponding privacy protection level is mapped to controllable randomized noise constrained by differential privacy. To avoid linkage attacks in personalized differential privacy, we designed a noise correlation decoupling mechanism using a Markov stochastic process. In addition, we build the community model on a blockchain, which can mitigate the risk of poisoning attacks during differentially private data transmission over SHNs. To testify the effectiveness and superiority of the proposed approach, we conduct extensive experiments on benchmark datasets.

CVFeb 25, 2025Code
LightFC-X: Lightweight Convolutional Tracker for RGB-X Tracking

Yunfeng Li, Bo Wang, Ye Li

Despite great progress in multimodal tracking, these trackers remain too heavy and expensive for resource-constrained devices. To alleviate this problem, we propose LightFC-X, a family of lightweight convolutional RGB-X trackers that explores a unified convolutional architecture for lightweight multimodal tracking. Our core idea is to achieve lightweight cross-modal modeling and joint refinement of the multimodal features and the spatiotemporal appearance features of the target. Specifically, we propose a novel efficient cross-attention module (ECAM) and a novel spatiotemporal template aggregation module (STAM). The ECAM achieves lightweight cross-modal interaction of template-search area integrated feature with only 0.08M parameters. The STAM enhances the model's utilization of temporal information through module fine-tuning paradigm. Comprehensive experiments show that our LightFC-X achieves state-of-the-art performance and the optimal balance between parameters, performance, and speed. For example, LightFC-T-ST outperforms CMD by 4.3% and 5.7% in SR and PR on the LasHeR benchmark, which it achieves 2.6x reduction in parameters and 2.7x speedup. It runs in real-time on the CPU at a speed of 22 fps. The code is available at https://github.com/LiYunfengLYF/LightFC-X.

CVApr 22, 2025Code
SonarT165: A Large-scale Benchmark and STFTrack Framework for Acoustic Object Tracking

Yunfeng Li, Bo Wang, Jiahao Wan et al.

Underwater observation systems typically integrate optical cameras and imaging sonar systems. When underwater visibility is insufficient, only sonar systems can provide stable data, which necessitates exploration of the underwater acoustic object tracking (UAOT) task. Previous studies have explored traditional methods and Siamese networks for UAOT. However, the absence of a unified evaluation benchmark has significantly constrained the value of these methods. To alleviate this limitation, we propose the first large-scale UAOT benchmark, SonarT165, comprising 165 square sequences, 165 fan sequences, and 205K high-quality annotations. Experimental results demonstrate that SonarT165 reveals limitations in current state-of-the-art SOT trackers. To address these limitations, we propose STFTrack, an efficient framework for acoustic object tracking. It includes two novel modules, a multi-view template fusion module (MTFM) and an optimal trajectory correction module (OTCM). The MTFM module integrates multi-view feature of both the original image and the binary image of the dynamic template, and introduces a cross-attention-like layer to fuse the spatio-temporal target representations. The OTCM module introduces the acoustic-response-equivalent pixel property and proposes normalized pixel brightness response scores, thereby suppressing suboptimal matches caused by inaccurate Kalman filter prediction boxes. To further improve the model feature, STFTrack introduces a acoustic image enhancement method and a Frequency Enhancement Module (FEM) into its tracking pipeline. Comprehensive experiments show the proposed STFTrack achieves state-of-the-art performance on the proposed benchmark. The code is available at https://github.com/LiYunfengLYF/SonarT165.

CVJun 11, 2024Code
RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker

Yunfeng Li, Bo Wang, Jiuran Sun et al.

Vision camera and sonar are naturally complementary in the underwater environment. Combining the information from two modalities will promote better observation of underwater targets. However, this problem has not received sufficient attention in previous research. Therefore, this paper introduces a new challenging RGB-Sonar (RGB-S) tracking task and investigates how to achieve efficient tracking of an underwater target through the interaction of RGB and sonar modalities. Specifically, we first propose an RGBS50 benchmark dataset containing 50 sequences and more than 87000 high-quality annotated bounding boxes. Experimental results show that the RGBS50 benchmark poses a challenge to currently popular SOT trackers. Second, we propose an RGB-S tracker called SCANet, which includes a spatial cross-attention module (SCAM) consisting of a novel spatial cross-attention layer and two independent global integration modules. The spatial cross-attention is used to overcome the problem of spatial misalignment of between RGB and sonar images. Third, we propose a SOT data-based RGB-S simulation training method (SRST) to overcome the lack of RGB-S training datasets. It converts RGB images into sonar-like saliency images to construct pseudo-data pairs, enabling the model to learn the semantic structure of RGB-S-like data. Comprehensive experiments show that the proposed spatial cross-attention effectively achieves the interaction between RGB and sonar modalities and SCANet achieves state-of-the-art performance on the proposed benchmark. The code is available at https://github.com/LiYunfengLYF/RGBS50.

CVMay 6, 2024Code
Transformer-based RGB-T Tracking with Channel and Spatial Feature Fusion

Yunfeng Li, Bo Wang, Ye Li

The main problem in RGB-T tracking is the correct and optimal merging of the cross-modal features of visible and thermal images. Some previous methods either do not fully exploit the potential of RGB and TIR information for channel and spatial feature fusion or lack a direct interaction between the template and the search area, which limits the model's ability to fully utilize the original semantic information of both modalities. To address these limitations, we investigate how to achieve a direct fusion of cross-modal channels and spatial features in RGB-T tracking and propose CSTNet. It uses the Vision Transformer (ViT) as the backbone and adds a Joint Spatial and Channel Fusion Module (JSCFM) and Spatial Fusion Module (SFM) integrated between the transformer blocks to facilitate cross-modal feature interaction. The JSCFM module achieves joint modeling of channel and multi-level spatial features. The SFM module includes a cross-attention-like architecture for cross modeling and joint learning of RGB and TIR features. Comprehensive experiments show that CSTNet achieves state-of-the-art performance. To enhance practicality, we retrain the model without JSCFM and SFM modules and use CSNet as the pretraining weight, and propose CSTNet-small, which achieves 50% speedup with an average decrease of 1-2% in SR and PR performance. CSTNet and CSTNet-small achieve real-time speeds of 21 fps and 33 fps on the Nvidia Jetson Xavier, meeting actual deployment requirements. Code is available at https://github.com/LiYunfengLYF/CSTNet.

LGMar 9, 2025
Privacy Protection in Prosumer Energy Management Based on Federated Learning

Yunfeng Li, Xiaolin Li Zhitao Li, Gangqiang Li

With the booming development of prosumers, there is an urgent need for a prosumer energy management system to take full advantage of the flexibility of prosumers and take into account the interests of other parties. However, building such a system will undoubtedly reveal users' privacy. In this paper, by solving the non-independent and identical distribution of data (Non-IID) problem in federated learning with federated cluster average(FedClusAvg) algorithm, prosumers' information can efficiently participate in the intelligent decision making of the system without revealing privacy. In the proposed FedClusAvg algorithm, each client performs cluster stratified sampling and multiple iterations. Then, the average weight of the parameters of the sub-server is determined according to the degree of deviation of the parameter from the average parameter. Finally, the sub-server multiple local iterations and updates, and then upload to the main server. The advantages of FedClusAvg algorithm are the following two parts. First, the accuracy of the model in the case of Non-IID is improved through the method of clustering and parameter weighted average. Second, local multiple iterations and three-tier framework can effectively reduce communication rounds.

SDJul 16, 2025
Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations

Yichen Han, Xiaoyang Hao, Keming Chen et al.

Text-to-speech (TTS) synthesis has seen renewed progress under the discrete modeling paradigm. Existing autoregressive approaches often rely on single-codebook representations, which suffer from significant information loss. Even with post-hoc refinement techniques such as flow matching, these methods fail to recover fine-grained details (e.g., prosodic nuances, speaker-specific timbres), especially in challenging scenarios like singing voice or music synthesis. We propose QTTS, a novel TTS framework built upon our new audio codec, QDAC. The core innovation of QDAC lies in its end-to-end training of an ASR-based auto-regressive network with a GAN, which achieves superior semantic feature disentanglement for scalable, near-lossless compression. QTTS models these discrete codes using two innovative strategies: the Hierarchical Parallel architecture, which uses a dual-AR structure to model inter-codebook dependencies for higher-quality synthesis, and the Delay Multihead approach, which employs parallelized prediction with a fixed delay to accelerate inference speed. Our experiments demonstrate that the proposed framework achieves higher synthesis quality and better preserves expressive content compared to baseline. This suggests that scaling up compression via multi-codebook modeling is a promising direction for high-fidelity, general-purpose speech and audio generation.

LGJul 19, 2025
Rec-AD: An Efficient Computation Framework for FDIA Detection Based on Tensor Train Decomposition and Deep Learning Recommendation Model

Yunfeng Li, Junhong Liu, Zhaohui Yang et al.

Deep learning models have been widely adopted for False Data Injection Attack (FDIA) detection in smart grids due to their ability to capture unstructured and sparse features. However, the increasing system scale and data dimensionality introduce significant computational and memory burdens, particularly in large-scale industrial datasets, limiting detection efficiency. To address these issues, this paper proposes Rec-AD, a computationally efficient framework that integrates Tensor Train decomposition with the Deep Learning Recommendation Model (DLRM). Rec-AD enhances training and inference efficiency through embedding compression, optimized data access via index reordering, and a pipeline training mechanism that reduces memory communication overhead. Fully compatible with PyTorch, Rec-AD can be integrated into existing FDIA detection systems without code modifications. Experimental results show that Rec-AD significantly improves computational throughput and real-time detection performance, narrowing the attack window and increasing attacker cost. These advancements strengthen edge computing capabilities and scalability, providing robust technical support for smart grid security.

LGJul 20, 2025
Clustered Federated Learning for Generalizable FDIA Detection in Smart Grids with Heterogeneous Data

Yunfeng Li, Junhong Liu, Zhaohui Yang et al.

False Data Injection Attacks (FDIAs) pose severe security risks to smart grids by manipulating measurement data collected from spatially distributed devices such as SCADA systems and PMUs. These measurements typically exhibit Non-Independent and Identically Distributed (Non-IID) characteristics across different regions, which significantly challenges the generalization ability of detection models. Traditional centralized training approaches not only face privacy risks and data sharing constraints but also incur high transmission costs, limiting their scalability and deployment feasibility. To address these issues, this paper proposes a privacy-preserving federated learning framework, termed Federated Cluster Average (FedClusAvg), designed to improve FDIA detection in Non-IID and resource-constrained environments. FedClusAvg incorporates cluster-based stratified sampling and hierarchical communication (client-subserver-server) to enhance model generalization and reduce communication overhead. By enabling localized training and weighted parameter aggregation, the algorithm achieves accurate model convergence without centralizing sensitive data. Experimental results on benchmark smart grid datasets demonstrate that FedClusAvg not only improves detection accuracy under heterogeneous data distributions but also significantly reduces communication rounds and bandwidth consumption. This work provides an effective solution for secure and efficient FDIA detection in large-scale distributed power systems.

CLFeb 21, 2025
Chats-Grid: An Iterative Retrieval Q&A Optimization Scheme Leveraging Large Model and Retrieval Enhancement Generation in smart grid

Yunfeng Li, Jiqun Zhang, Guofu Liao et al.

With rapid advancements in artificial intelligence, question-answering (Q&A) systems have become essential in intelligent search engines, virtual assistants, and customer service platforms. However, in dynamic domains like smart grids, conventional retrieval-augmented generation(RAG) Q&A systems face challenges such as inadequate retrieval quality, irrelevant responses, and inefficiencies in handling large-scale, real-time data streams. This paper proposes an optimized iterative retrieval-based Q&A framework called Chats-Grid tailored for smart grid environments. In the pre-retrieval phase, Chats-Grid advanced query expansion ensures comprehensive coverage of diverse data sources, including sensor readings, meter records, and control system parameters. During retrieval, Best Matching 25(BM25) sparse retrieval and BAAI General Embedding(BGE) dense retrieval in Chats-Grid are combined to process vast, heterogeneous datasets effectively. Post-retrieval, a fine-tuned large language model uses prompt engineering to assess relevance, filter irrelevant results, and reorder documents based on contextual accuracy. The model further generates precise, context-aware answers, adhering to quality criteria and employing a self-checking mechanism for enhanced reliability. Experimental results demonstrate Chats-Grid's superiority over state-of-the-art methods in fidelity, contextual recall, relevance, and accuracy by 2.37%, 2.19%, and 3.58% respectively. This framework advances smart grid management by improving decision-making and user interactions, fostering resilient and adaptive smart grid infrastructures.

AIJan 16, 2022
Temporal Knowledge Graph Completion: A Survey

Borui Cai, Yong Xiang, Longxiang Gao et al.

Knowledge graph completion (KGC) can predict missing links and is crucial for real-world knowledge graphs, which widely suffer from incompleteness. KGC methods assume a knowledge graph is static, but that may lead to inaccurate prediction results because many facts in the knowledge graphs change over time. Recently, emerging methods have shown improved predictive results by further incorporating the timestamps of facts; namely, temporal knowledge graph completion (TKGC). With this temporal information, TKGC methods can learn the dynamic evolution of the knowledge graph that KGC methods fail to capture. In this paper, for the first time, we summarize the recent advances in TKGC research. First, we detail the background of TKGC, including the problem definition, benchmark datasets, and evaluation metrics. Then, we summarize existing TKGC methods based on how timestamps of facts are used to capture the temporal dynamics. Finally, we conclude the paper and present future research directions of TKGC.

LGFeb 21, 2020
Leveraging Cross Feedback of User and Item Embeddings with Attention for Variational Autoencoder based Collaborative Filtering

Yuan Jin, He Zhao, Ming Liu et al.

Matrix factorization (MF) has been widely applied to collaborative filtering in recommendation systems. Its Bayesian variants can derive posterior distributions of user and item embeddings, and are more robust to sparse ratings. However, the Bayesian methods are restricted by their update rules for the posterior parameters due to the conjugacy of the priors and the likelihood. Variational autoencoders (VAE) can address this issue by capturing complex mappings between the posterior parameters and the data. However, current research on VAEs for collaborative filtering only considers the mappings based on the explicit data information while the implicit embedding information is overlooked. In this paper, we first derive evidence lower bounds (ELBO) for Bayesian MF models from two viewpoints: user-oriented and item-oriented. Based on the ELBOs, we propose a VAE-based Bayesian MF framework. It leverages not only the data but also the embedding information to approximate the user-item joint distribution. As suggested by the ELBOs, the approximation is iterative with cross feedback of user and item embeddings into each other's encoders. More specifically, user embeddings sampled at the previous iteration are fed to the item-side encoders to estimate the posterior parameters for the item embeddings at the current iteration, and vice versa. The estimation also attends to the cross-fed embeddings to further exploit useful information. The decoder then reconstructs the data via the matrix factorization over the currently re-sampled user and item embeddings.

LGOct 12, 2019
Variational Auto-encoder Based Bayesian Poisson Tensor Factorization for Sparse and Imbalanced Count Data

Yuan Jin, Ming Liu, Yunfeng Li et al.

Non-negative tensor factorization models enable predictive analysis on count data. Among them, Bayesian Poisson-Gamma models can derive full posterior distributions of latent factors and are less sensitive to sparse count data. However, current inference methods for these Bayesian models adopt restricted update rules for the posterior parameters. They also fail to share the update information to better cope with the data sparsity. Moreover, these models are not endowed with a component that handles the imbalance in count data values. In this paper, we propose a novel variational auto-encoder framework called VAE-BPTF which addresses the above issues. It uses multi-layer perceptron networks to encode and share complex update information. The encoded information is then reweighted per data instance to penalize common data values before aggregated to compute the posterior parameters for the latent factors. Under synthetic data evaluation, VAE-BPTF tended to recover the right number of latent factors and posterior parameter values. It also outperformed current models in both reconstruction errors and latent factor (semantic) coherence across five real-world datasets. Furthermore, the latent factors inferred by VAE-BPTF are perceived to be meaningful and coherent under a qualitative analysis.