Hui Feng

CV
h-index33
19papers
338citations
Novelty48%
AI Score40

19 Papers

CVMay 2, 2022
MemSeg: A semi-supervised method for image surface defect detection using differences and commonalities

Minghui Yang, Peng Wu, Jing Liu et al.

Under the semi-supervised framework, we propose an end-to-end memory-based segmentation network (MemSeg) to detect surface defects on industrial products. Considering the small intra-class variance of products in the same production line, from the perspective of differences and commonalities, MemSeg introduces artificially simulated abnormal samples and memory samples to assist the learning of the network. In the training phase, MemSeg explicitly learns the potential differences between normal and simulated abnormal images to obtain a robust classification hyperplane. At the same time, inspired by the mechanism of human memory, MemSeg uses a memory pool to store the general patterns of normal samples. By comparing the similarities and differences between input samples and memory samples in the memory pool to give effective guesses about abnormal regions; In the inference phase, MemSeg directly determines the abnormal regions of the input image in an end-to-end manner. Through experimental validation, MemSeg achieves the state-of-the-art (SOTA) performance on MVTec AD datasets with AUC scores of 99.56% and 98.84% at the image-level and pixel-level, respectively. In addition, MemSeg also has a significant advantage in inference speed benefiting from the end-to-end and straightforward network structure, which better meets the real-time requirement in industrial scenarios.

CVAug 31, 2023
RGB-T Tracking via Multi-Modal Mutual Prompt Learning

Yang Luo, Xiqing Guo, Hui Feng et al.

Object tracking based on the fusion of visible and thermal im-ages, known as RGB-T tracking, has gained increasing atten-tion from researchers in recent years. How to achieve a more comprehensive fusion of information from the two modalities with fewer computational costs has been a problem that re-searchers have been exploring. Recently, with the rise of prompt learning in computer vision, we can better transfer knowledge from visual large models to downstream tasks. Considering the strong complementarity between visible and thermal modalities, we propose a tracking architecture based on mutual prompt learning between the two modalities. We also design a lightweight prompter that incorporates attention mechanisms in two dimensions to transfer information from one modality to the other with lower computational costs, embedding it into each layer of the backbone. Extensive ex-periments have demonstrated that our proposed tracking ar-chitecture is effective and efficient, achieving state-of-the-art performance while maintaining high running speeds.

LGAug 30, 2024
The Transferability of Downsamped Sparse Graph Convolutional Networks

Qinji Shu, Hang Sheng, Feng Ji et al.

To accelerate the training of graph convolutional networks (GCNs) on real-world large-scale sparse graphs, downsampling methods are commonly employed as a preprocessing step. However, the effects of graph sparsity and topological structure on the transferability of downsampling methods have not been rigorously analyzed or theoretically guaranteed, particularly when the topological structure is affected by graph sparsity. In this paper, we introduce a novel downsampling method based on a sparse random graph model and derive an expected upper bound for the transfer error. Our findings show that smaller original graph sizes, higher expected average degrees, and increased sampling rates contribute to reducing this upper bound. Experimental results validate the theoretical predictions. By incorporating both sparsity and topological similarity into the model, this study establishes an upper bound on the transfer error for downsampling in the training of large-scale sparse graphs and provides insight into the influence of topological structure on transfer performance.

ASSep 25, 2024
Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models

Zhichen Han, Tianqi Geng, Hui Feng et al.

Utilizing Self-Supervised Learning (SSL) models for Speech Emotion Recognition (SER) has proven effective, yet limited research has explored cross-lingual scenarios. This study presents a comparative analysis between human performance and SSL models, beginning with a layer-wise analysis and an exploration of parameter-efficient fine-tuning strategies in monolingual, cross-lingual, and transfer learning contexts. We further compare the SER ability of models and humans at both utterance- and segment-levels. Additionally, we investigate the impact of dialect on cross-lingual SER through human evaluation. Our findings reveal that models, with appropriate knowledge transfer, can adapt to the target language and achieve performance comparable to native speakers. We also demonstrate the significant effect of dialect on SER for individuals without prior linguistic and paralinguistic background. Moreover, both humans and models exhibit distinct behaviors across different emotions. These results offer new insights into the cross-lingual SER capabilities of SSL models, underscoring both their similarities to and differences from human emotion perception.

AIFeb 29, 2024
Brain-inspired and Self-based Artificial Intelligence

Yi Zeng, Feifei Zhao, Yuxuan Zhao et al.

The question "Can machines think?" and the Turing Test to assess whether machines could achieve human-level intelligence is one of the roots of AI. With the philosophical argument "I think, therefore I am", this paper challenge the idea of a "thinking machine" supported by current AIs since there is no sense of self in them. Current artificial intelligence is only seemingly intelligent information processing and does not truly understand or be subjectively aware of oneself and perceive the world with the self as human intelligence does. In this paper, we introduce a Brain-inspired and Self-based Artificial Intelligence (BriSe AI) paradigm. This BriSe AI paradigm is dedicated to coordinating various cognitive functions and learning strategies in a self-organized manner to build human-level AI models and robotic applications. Specifically, BriSe AI emphasizes the crucial role of the Self in shaping the future AI, rooted with a practical hierarchical Self framework, including Perception and Learning, Bodily Self, Autonomous Self, Social Self, and Conceptual Self. The hierarchical framework of the Self highlights self-based environment perception, self-bodily modeling, autonomous interaction with the environment, social interaction and collaboration with others, and even more abstract understanding of the Self. Furthermore, the positive mutual promotion and support among multiple levels of Self, as well as between Self and learning, enhance the BriSe AI's conscious understanding of information and flexible adaptation to complex environments, serving as a driving force propelling BriSe AI towards real Artificial General Intelligence.

IVMay 9, 2025
Predicting Diabetic Macular Edema Treatment Responses Using OCT: Dataset and Methods of APTOS Competition

Weiyi Zhang, Peranut Chotcomwongse, Yinwen Li et al.

Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance this research, we organized the 2nd Asia-Pacific Tele-Ophthalmology Society (APTOS) Big Data Competition in 2021. The competition focused on improving predictive accuracy for anti-VEGF therapy responses using ophthalmic OCT images. We provided a dataset containing tens of thousands of OCT images from 2,000 patients with labels across four sub-tasks. This paper details the competition's structure, dataset, leading methods, and evaluation metrics. The competition attracted strong scientific community participation, with 170 teams initially registering and 41 reaching the final round. The top-performing team achieved an AUC of 80.06%, highlighting the potential of AI in personalized DME treatment and clinical decision-making.

CVApr 7, 2025
Inland Waterway Object Detection in Multi-environment: Dataset and Approach

Shanshan Wang, Haixiang Xu, Hui Feng et al.

The success of deep learning in intelligent ship visual perception relies heavily on rich image data. However, dedicated datasets for inland waterway vessels remain scarce, limiting the adaptability of visual perception systems in complex environments. Inland waterways, characterized by narrow channels, variable weather, and urban interference, pose significant challenges to object detection systems based on existing datasets. To address these issues, this paper introduces the Multi-environment Inland Waterway Vessel Dataset (MEIWVD), comprising 32,478 high-quality images from diverse scenarios, including sunny, rainy, foggy, and artificial lighting conditions. MEIWVD covers common vessel types in the Yangtze River Basin, emphasizing diversity, sample independence, environmental complexity, and multi-scale characteristics, making it a robust benchmark for vessel detection. Leveraging MEIWVD, this paper proposes a scene-guided image enhancement module to improve water surface images based on environmental conditions adaptively. Additionally, a parameter-limited dilated convolution enhances the representation of vessel features, while a multi-scale dilated residual fusion method integrates multi-scale features for better detection. Experiments show that MEIWVD provides a more rigorous benchmark for object detection algorithms, and the proposed methods significantly improve detector performance, especially in complex multi-environment scenarios.

LGMar 6, 2025
The Impact Analysis of Delays in Asynchronous Federated Learning with Data Heterogeneity for Edge Intelligence

Ziruo Hao, Zhenhua Cui, Tao Yang et al.

Federated learning (FL) has provided a new methodology for coordinating a group of clients to train a machine learning model collaboratively, bringing an efficient paradigm in edge intelligence. Despite its promise, FL faces several critical challenges in practical applications involving edge devices, such as data heterogeneity and delays stemming from communication and computation constraints. This paper examines the impact of unknown causes of delay on training performance in an Asynchronous Federated Learning (AFL) system with data heterogeneity. Initially, an asynchronous error definition is proposed, based on which the solely adverse impact of data heterogeneity is theoretically analyzed within the traditional Synchronous Federated Learning (SFL) framework. Furthermore, Asynchronous Updates with Delayed Gradients (AUDG), a conventional AFL scheme, is discussed. Investigation into AUDG reveals that the negative influence of data heterogeneity is correlated with delays, while a shorter average delay from a specific client does not consistently enhance training performance. In order to compensate for the scenarios where AUDG are not adapted, Pseudo-synchronous Updates by Reusing Delayed Gradients (PSURDG) is proposed, and its theoretical convergence is analyzed. In both AUDG and PSURDG, only a random set of clients successfully transmits their updated results to the central server in each iteration. The critical difference between them lies in whether the delayed information is reused. Finally, both schemes are validated and compared through theoretical analysis and simulations, demonstrating more intuitively that discarding outdated information due to time delays is not always the best approach.

CVOct 12, 2025
DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis

Peiyin Chen, Zhuowei Yang, Hui Feng et al.

Audio-driven talking-head generation has advanced rapidly with diffusion-based generative models, yet producing temporally coherent videos with fine-grained motion control remains challenging. We propose DEMO, a flow-matching generative framework for audio-driven talking-portrait video synthesis that delivers disentangled, high-fidelity control of lip motion, head pose, and eye gaze. The core contribution is a motion auto-encoder that builds a structured latent space in which motion factors are independently represented and approximately orthogonalized. On this disentangled motion space, we apply optimal-transport-based flow matching with a transformer predictor to generate temporally smooth motion trajectories conditioned on audio. Extensive experiments across multiple benchmarks show that DEMO outperforms prior methods in video realism, lip-audio synchronization, and motion fidelity. These results demonstrate that combining fine-grained motion disentanglement with flow-based generative modeling provides a powerful new paradigm for controllable talking-head video synthesis.

AIJul 22, 2025
Distilled Large Language Model in Confidential Computing Environment for System-on-Chip Design

Dong Ben, Hui Feng, Qian Wang

Large Language Models (LLMs) are increasingly used in circuit design tasks and have typically undergone multiple rounds of training. Both the trained models and their associated training data are considered confidential intellectual property (IP) and must be protected from exposure. Confidential Computing offers a promising solution to protect data and models through Trusted Execution Environments (TEEs). However, existing TEE implementations are not designed to support the resource-intensive nature of LLMs efficiently. In this work, we first present a comprehensive evaluation of the LLMs within a TEE-enabled confidential computing environment, specifically utilizing Intel Trust Domain Extensions (TDX). We constructed experiments on three environments: TEE-based, CPU-only, and CPU-GPU hybrid implementations, and evaluated their performance in terms of tokens per second. Our first observation is that distilled models, i.e., DeepSeek, surpass other models in performance due to their smaller parameters, making them suitable for resource-constrained devices. Also, in the quantized models such as 4-bit quantization (Q4) and 8-bit quantization (Q8), we observed a performance gain of up to 3x compared to FP16 models. Our findings indicate that for fewer parameter sets, such as DeepSeek-r1-1.5B, the TDX implementation outperforms the CPU version in executing computations within a secure environment. We further validate the results using a testbench designed for SoC design tasks. These validations demonstrate the potential of efficiently deploying lightweight LLMs on resource-constrained systems for semiconductor CAD applications.

LGMar 8, 2025
Invariant Federated Learning for Edge Intelligence: Mitigating Heterogeneity and Asynchrony via Exit Strategy and Invariant Penalty

Ziruo Hao, Zhenhua Cui, Tao Yang et al.

This paper provides an invariant federated learning system for resource-constrained edge intelligence. This framework can mitigate the impact of heterogeneity and asynchrony via exit strategy and invariant penalty. We introduce parameter orthogonality into edge intelligence to measure the contribution or impact of heterogeneous and asynchronous clients. It is proved in this paper that the exit of abnormal edge clients can guarantee the effect of the model on most clients. Meanwhile, to ensure the models' performance on exited abnormal clients and those who lack training resources, we propose Federated Learning with Invariant Penalty for Generalization (FedIPG) by constructing the approximate orthogonality of the invariant parameters and the heterogeneous parameters. Theoretical proof shows that FedIPG reduces the Out-Of-Distribution prediction loss without increasing the communication burden. The performance of FedIPG combined with an exit strategy is tested empirically in multiple scales using four datasets. It shows our system can enhance In-Distribution performance and outperform the state-of-the-art algorithm in Out-Of-Distribution generalization while maintaining model convergence. Additionally, the results of the visual experiment prove that FedIPG contains preliminary causality in terms of ignoring confounding features.

AIOct 29, 2024
Building Altruistic and Moral AI Agent with Brain-inspired Emotional Empathy Mechanisms

Feifei Zhao, Hui Feng, Haibo Tong et al.

As AI closely interacts with human society, it is crucial to ensure that its behavior is safe, altruistic, and aligned with human ethical and moral values. However, existing research on embedding ethical considerations into AI remains insufficient, and previous external constraints based on principles and rules are inadequate to provide AI with long-term stability and generalization capabilities. Emotional empathy intrinsically motivates altruistic behaviors aimed at alleviating others' negative emotions through emotional sharing and contagion mechanisms. Motivated by this, we draw inspiration from the neural mechanism of human emotional empathy-driven altruistic decision making, and simulate the shared self-other perception-mirroring-empathy neural circuits, to construct a brain-inspired emotional empathy-driven altruistic decision-making model. Here, empathy directly impacts dopamine release to form intrinsic altruistic motivation. The proposed model exhibits consistent altruistic behaviors across three experimental settings: emotional contagion-integrated two-agent altruistic rescue, multi-agent gaming, and robotic emotional empathy interaction scenarios. In-depth analyses validate the positive correlation between empathy levels and altruistic preferences (consistent with psychological behavioral experiment findings), while also demonstrating how interaction partners' empathy levels influence the agent's behavioral patterns. We further test the proposed model's performance and stability in moral dilemmas involving conflicts between self-interest and others' well-being, partially observable environments, and adversarial defense scenarios. This work provides preliminary exploration of human-like empathy-driven altruistic moral decision making, contributing potential perspectives for developing ethically-aligned AI.

LGFeb 19, 2021
Regularized Recovery by Multi-order Partial Hypergraph Total Variation

Ruyuan Qu, Jiaqi He, Hui Feng et al.

Capturing complex high-order interactions among data is an important task in many scenarios. A common way to model high-order interactions is to use hypergraphs whose topology can be mathematically represented by tensors. Existing methods use a fixed-order tensor to describe the topology of the whole hypergraph, which ignores the divergence of different-order interactions. In this work, we take this divergence into consideration, and propose a multi-order hypergraph Laplacian and the corresponding total variation. Taking this total variation as a regularization term, we can utilize the topology information contained by it to smooth the hypergraph signal. This can help distinguish different-order interactions and represent high-order interactions accurately.

CVDec 31, 2020
SharpGAN: Receptive Field Block Net for Dynamic Scene Deblurring

Hui Feng, Jundong Guo, Sam Shuzhi Ge

When sailing at sea, the smart ship will inevitably produce swaying motion due to the action of wind, wave and current, which makes the image collected by the visual sensor appear motion blur. This will have an adverse effect on the object detection algorithm based on the vision sensor, thereby affect the navigation safety of the smart ship. In order to remove the motion blur in the images during the navigation of the smart ship, we propose SharpGAN, a new image deblurring method based on the generative adversarial network. First of all, the Receptive Field Block Net (RFBNet) is introduced to the deblurring network to strengthen the network's ability to extract the features of blurred image. Secondly, we propose a feature loss that combines different levels of image features to guide the network to perform higher-quality deblurring and improve the feature similarity between the restored images and the sharp image. Finally, we propose to use the lightweight RFB-s module to improve the real-time performance of deblurring network. Compared with the existing deblurring methods on large-scale real sea image datasets and large-scale deblurring datasets, the proposed method not only has better deblurring performance in visual perception and quantitative criteria, but also has higher deblurring efficiency.

CVAug 6, 2019
OD-GCN: Object Detection Boosted by Knowledge GCN

Zheng Liu, Zidong Jiang, Wei Feng et al.

Classical CNN based object detection methods only extract the objects' image features, but do not consider the high-level relationship among objects in context. In this article, the graph convolutional networks (GCN) is integrated into the object detection framework to exploit the benefit of category relationship among objects, which is able to provide extra confidence for any pre-trained object detection model in our framework. In experiments, we test several popular base detection models on COCO dataset. The results show promising improvement on mAP by 1-5pp. In addition, visualized analysis reveals the benchmark improvement is quite reasonable in human's opinion.

NIMay 9, 2019
Toward Packet Routing with Fully-distributed Multi-agent Deep Reinforcement Learning

Xinyu You, Xuanjie Li, Yuedong Xu et al.

Packet routing is one of the fundamental problems in computer networks in which a router determines the next-hop of each packet in the queue to get it as quickly as possible to its destination. Reinforcement learning (RL) has been introduced to design autonomous packet routing policies with local information of stochastic packet arrival and service. However, the curse of dimensionality of RL prohibits the more comprehensive representation of dynamic network states, thus limiting its potential benefit. In this paper, we propose a novel packet routing framework based on \emph{multi-agent} deep reinforcement learning (DRL) in which each router possess an \emph{independent} LSTM recurrent neural network for training and decision making in a \emph{fully distributed} environment. The LSTM recurrent neural network extracts routing features from rich information regarding backlogged packets and past actions, and effectively approximates the value function of Q-learning. We further allow each route to communicate periodically with direct neighbors so that a broader view of network state can be incorporated. Experimental results manifest that our multi-agent DRL policy can strike the delicate balance between congestion-aware and shortest routes, and significantly reduce the packet delivery time in general network topologies compared with its counterparts.

CVAug 12, 2018
Fine-grained visual recognition with salient feature detection

Hui Feng, Shanshan Wang, Shuzhi Sam Ge

Computer vision based fine-grained recognition has received great attention in recent years. Existing works focus on discriminative part localization and feature learning. In this paper, to improve the performance of fine-grained recognition, we try to precisely locate as many salient parts of object as possible at first. Then, we figure out the classification probability that can be obtained by using separate parts for object classification. Finally, through extracting efficient features from each part and combining them, then feeding to a classifier for recognition, an improved accuracy over state-of-art algorithms has been obtained on CUB200-2011 bird dataset.

CVMay 1, 2018
Object Activity Scene Description, Construction and Recognition

Hui Feng, Shanshan Wang, Shuzhi Sam Ge

Action recognition is a critical task for social robots to meaningfully engage with their environment. 3D human skeleton-based action recognition is an attractive research area in recent years. Although, the existing approaches are good at action recognition, it is a great challenge to recognize a group of actions in an activity scene. To tackle this problem, at first, we partition the scene into several primitive actions (PAs) based upon motion attention mechanism. Then, the primitive actions are described by the trajectory vectors of corresponding joints. After that, motivated by text classification based on word embedding, we employ convolution neural network (CNN) to recognize activity scenes by considering motion of joints as "word" of activity. The experimental results on the scenes of human activity dataset show the efficiency of the proposed approach.

ITMar 2, 2017
Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System

Yiling Yuan, Tao Yang, Hui Feng et al.

We consider a D2D-enabled cellular network where user equipments (UEs) owned by rational users are incentivized to form D2D pairs using tokens. They exchange tokens electronically to "buy" and "sell" D2D services. Meanwhile the devices have the ability to choose the transmission mode, i.e. receiving data via cellular links or D2D links. Thus taking the different benefits brought by diverse traffic types as a prior, the UEs can utilize their tokens more efficiently via transmission mode selection. In this paper, the optimal transmission mode selection strategy as well as token collection policy are investigated to maximize the long-term utility in the dynamic network environment. The optimal policy is proved to be a threshold strategy, and the thresholds have a monotonicity property. Numerical simulations verify our observations and the gain from transmission mode selection is observed.