Fusheng Hao

h-index8

4papers

4citations

Novelty59%

AI Score46

Ranked #64,437 of 201,326 authors (top 32%)#23,101 in CV (top 39%)

4 Papers

CVJun 6, 2023Code

Human-imperceptible, Machine-recognizable Images

Fusheng Hao, Fengxiang He, Yikai Wang et al.

Massive human-related data is collected to train neural networks for computer vision tasks. A major conflict is exposed relating to software engineers between better developing AI systems and distancing from the sensitive training data. To reconcile this conflict, this paper proposes an efficient privacy-preserving learning paradigm, where images are first encrypted to become ``human-imperceptible, machine-recognizable'' via one of the two encryption strategies: (1) random shuffling to a set of equally-sized patches and (2) mixing-up sub-patches of the images. Then, minimal adaptations are made to vision transformer to enable it to learn on the encrypted images for vision tasks, including image classification and object detection. Extensive experiments on ImageNet and COCO show that the proposed paradigm achieves comparable accuracy with the competitive methods. Decrypting the encrypted images requires solving an NP-hard jigsaw puzzle or an ill-posed inverse problem, which is empirically shown intractable to be recovered by various attackers, including the powerful vision transformer-based attacker. We thus show that the proposed paradigm can ensure the encrypted images have become human-imperceptible while preserving machine-recognizable information. The code is available at \url{https://github.com/FushengHao/PrivacyPreservingML.}

CVMay 18

IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning

Chenghao Li, Fusheng Hao, Xikai Zhang et al.

Multimodal large language models via reinforcement learning (RL) have demonstrated remarkable capabilities in complex visual reasoning tasks, yet they remain limited in long-horizon multimodal scenarios, often suffering from visual hallucination and logical error. Current methods typically pre-encode high-dimensional visual scenes into discrete textual proxies to facilitate downstream reasoning. As the reasoning chain unfolds, however, the inherent information asymmetry between text and visual scenes tends to erode visual grounding, resulting in misguided reasoning and erroneous outputs. To address this issue, we introduce IVR-R1 (Iterative Visual-grounded Reasoning), a novel RL training framework that facilitates dynamic visual re-alignment that actively rectifies reasoning trajectories to guide policy optimization. Specifically, by leveraging a reward-driven screening mechanism to identify flawed rollouts, IVR-R1 executes a fine-grained, step-level error attribution within the multimodal context. By iteratively cross-referencing intermediate reasoning states against pristine visual priors, a Re-Reasoning Loop enables automated trajectory rectification, effectively synthesizing expert-level demonstrations that serve as high-fidelity reasoning templates for the policy model. Our experiments across diverse multimodal benchmarks demonstrate that IVR-R1 consistently outperforms existing reinforcement learning methods, establishing a superior paradigm for maintaining logical and visual consistency in complex multimodal reasoning.

LGJun 14, 2025

Gradient-based Fine-Tuning through Pre-trained Model Regularization

Xuanbo Liu, Liu Liu, Fuxiang Wu et al.

Large pre-trained models have demonstrated extensive applications across various fields. However, fine-tuning these models for specific downstream tasks demands significant computational resources and storage. One fine-tuning method, gradient-based parameter selection (GPS), focuses on fine-tuning only the parameters with high gradients in each neuron, thereby reducing the number of training parameters. Nevertheless, this approach increases computational resource requirements and storage demands. In this paper, we propose an efficient gradient-based and regularized fine-tuning method (GRFT) that updates the rows or columns of the weight matrix. We theoretically demonstrate that the rows or columns with the highest sum of squared gradients are optimal for updating. This strategy effectively reduces storage overhead and improves the efficiency of parameter selection. Additionally, we incorporate regularization to enhance knowledge transfer from the pre-trained model. GRFT achieves state-of-the-art performance, surpassing existing methods such as GPS, Adapter Tuning, and LoRA. Notably, GRFT requires updating only 1.22% and 0.30% of the total parameters on FGVC and VTAB datasets, respectively, demonstrating its high efficiency and effectiveness. The source code will be released soon.

CVApr 22, 2018

Anchor-based Nearest Class Mean Loss for Convolutional Neural Networks

Fusheng Hao, Jun Cheng, Lei Wang et al.

Discriminative features are critical for machine learning applications. Most existing deep learning approaches, however, rely on convolutional neural networks (CNNs) for learning features, whose discriminant power is not explicitly enforced. In this paper, we propose a novel approach to train deep CNNs by imposing the intra-class compactness and the inter-class separability, so as to enhance the learned features' discriminant power. To this end, we introduce anchors, which are predefined vectors regarded as the centers for each class and fixed during training. Discriminative features are obtained by constraining the deep CNNs to map training samples to the corresponding anchors as close as possible. We propose two principles to select the anchors, and measure the proximity of two points using the Euclidean and cosine distance metric functions, which results in two novel loss functions. These loss functions require no sample pairs or triplets and can be efficiently optimized by batch stochastic gradient descent. We test the proposed method on three benchmark image classification datasets and demonstrate its promising results.