Jingjing Zheng

CV
h-index18
15papers
162citations
Novelty56%
AI Score50

15 Papers

CVJul 30, 2022
Learning Feature Decomposition for Domain Adaptive Monocular Depth Estimation

Shao-Yuan Lo, Wei Wang, Jim Thomas et al.

Monocular depth estimation (MDE) has attracted intense study due to its low cost and critical functions for robotic tasks such as localization, mapping and obstacle detection. Supervised approaches have led to great success with the advance of deep learning, but they rely on large quantities of ground-truth depth annotations that are expensive to acquire. Unsupervised domain adaptation (UDA) transfers knowledge from labeled source data to unlabeled target data, so as to relax the constraint of supervised learning. However, existing UDA approaches may not completely align the domain gap across different datasets because of the domain shift problem. We believe better domain alignment can be achieved via well-designed feature decomposition. In this paper, we propose a novel UDA method for MDE, referred to as Learning Feature Decomposition for Adaptation (LFDA), which learns to decompose the feature space into content and style components. LFDA only attempts to align the content component since it has a smaller domain gap. Meanwhile, it excludes the style component which is specific to the source domain from training the primary task. Furthermore, LFDA uses separate feature distribution estimations to further bridge the domain gap. Extensive experiments on three domain adaptative MDE scenarios show that the proposed method achieves superior accuracy and lower computational cost compared to the state-of-the-art approaches.

LGNov 30, 2023
Data-Agnostic Model Poisoning against Federated Learning: A Graph Autoencoder Approach

Kai Li, Jingjing Zheng, Xin Yuan et al.

This paper proposes a novel, data-agnostic, model poisoning attack on Federated Learning (FL), by designing a new adversarial graph autoencoder (GAE)-based framework. The attack requires no knowledge of FL training data and achieves both effectiveness and undetectability. By listening to the benign local models and the global model, the attacker extracts the graph structural correlations among the benign local models and the training data features substantiating the models. The attacker then adversarially regenerates the graph structural correlations while maximizing the FL training loss, and subsequently generates malicious local models using the adversarial graph structure and the training data features of the benign ones. A new algorithm is designed to iteratively train the malicious local models using GAE and sub-gradient descent. The convergence of FL under attack is rigorously proved, with a considerably large optimality gap. Experiments show that the FL accuracy drops gradually under the proposed attack and existing defense mechanisms fail to detect it. The attack can give rise to an infection across all benign devices, making it a serious threat to FL.

CVMay 11, 2022
DcnnGrasp: Towards Accurate Grasp Pattern Recognition with Adaptive Regularizer Learning

Xiaoqin Zhang, Ziwei Huang, Jingjing Zheng et al.

The task of grasp pattern recognition aims to derive the applicable grasp types of an object according to the visual information. Current state-of-the-art methods ignore category information of objects which is crucial for grasp pattern recognition. This paper presents a novel dual-branch convolutional neural network (DcnnGrasp) to achieve joint learning of object category classification and grasp pattern recognition. DcnnGrasp takes object category classification as an auxiliary task to improve the effectiveness of grasp pattern recognition. Meanwhile, a new loss function called joint cross-entropy with an adaptive regularizer is derived through maximizing a posterior, which significantly improves the model performance. Besides, based on the new loss function, a training strategy is proposed to maximize the collaborative learning of the two tasks. The experiment was performed on five household objects datasets including the RGB-D Object dataset, Hit-GPRec dataset, Amsterdam library of object images (ALOI), Columbia University Image Library (COIL-100), and MeganePro dataset 1. The experimental results demonstrated that the proposed method can achieve competitive performance on grasp pattern recognition with several state-of-the-art methods. Specifically, our method even outperformed the second-best one by nearly 15% in terms of global accuracy for the case of testing a novel object on the RGB-D Object dataset.

MLNov 23, 2023
Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework

Jingjing Zheng, Wanglong Lu, Wenzhe Wang et al.

Recently, numerous tensor singular value decomposition (t-SVD)-based tensor recovery methods have shown promise in processing visual data, such as color images and videos. However, these methods often suffer from severe performance degradation when confronted with tensor data exhibiting non-smooth changes. It has been commonly observed in real-world scenarios but ignored by the traditional t-SVD-based methods. In this work, we introduce a novel tensor recovery model with a learnable tensor nuclear norm to address such a challenge. We develop a new optimization algorithm named the Alternating Proximal Multiplier Method (APMM) to iteratively solve the proposed tensor completion model. Theoretical analysis demonstrates the convergence of the proposed APMM to the Karush-Kuhn-Tucker (KKT) point of the optimization problem. In addition, we propose a multi-objective tensor recovery framework based on APMM to efficiently explore the correlations of tensor data across its various dimensions, providing a new perspective on extending the t-SVD-based method to higher-order tensor cases. Numerical experiments demonstrated the effectiveness of the proposed method in tensor completion.

CRJun 2, 2024Code
A Novel Defense Against Poisoning Attacks on Federated Learning: LayerCAM Augmented with Autoencoder

Jingjing Zheng, Xin Yuan, Kai Li et al.

Recent attacks on federated learning (FL) can introduce malicious model updates that circumvent widely adopted Euclidean distance-based detection methods. This paper proposes a novel defense strategy, referred to as LayerCAM-AE, designed to counteract model poisoning in federated learning. The LayerCAM-AE puts forth a new Layer Class Activation Mapping (LayerCAM) integrated with an autoencoder (AE), significantly enhancing detection capabilities. Specifically, LayerCAM-AE generates a heat map for each local model update, which is then transformed into a more compact visual format. The autoencoder is designed to process the LayerCAM heat maps from the local model updates, improving their distinctiveness and thereby increasing the accuracy in spotting anomalous maps and malicious local models. To address the risk of misclassifications with LayerCAM-AE, a voting algorithm is developed, where a local model update is flagged as malicious if its heat maps are consistently suspicious over several rounds of communication. Extensive tests of LayerCAM-AE on the SVHN and CIFAR-100 datasets are performed under both Independent and Identically Distributed (IID) and non-IID settings in comparison with existing ResNet-50 and REGNETY-800MF defense models. Experimental results show that LayerCAM-AE increases detection rates (Recall: 1.0, Precision: 1.0, FPR: 0.0, Accuracy: 1.0, F1 score: 1.0, AUC: 1.0) and test accuracy in FL, surpassing the performance of both the ResNet-50 and REGNETY-800MF. Our code is available at: https://github.com/jjzgeeks/LayerCAM-AE

LGDec 11, 2024Code
Adaptive Principal Components Allocation with the $\ell_{2,g}$-regularized Gaussian Graphical Model for Efficient Fine-Tuning Large Models

Jingjing Zheng, Yankai Cao

In this work, we propose a novel Parameter-Efficient Fine-Tuning (PEFT) approach based on Gaussian Graphical Models (GGMs), marking the first application of GGMs to PEFT tasks, to the best of our knowledge. The proposed method utilizes the $\ell_{2,g}$-norm to effectively select critical parameters and capture global dependencies. The resulting non-convex optimization problem is efficiently solved using a Block Coordinate Descent (BCD) algorithm. Experimental results on the GLUE benchmark [24] for fine-tuning RoBERTa-Base [18] demonstrate the effectiveness of the proposed approach, achieving competitive performance with significantly fewer trainable parameters. The code for this work is available at: https://github.com/jzheng20/Course projects.git.

CRApr 23, 2024
Leverage Variational Graph Representation For Model Poisoning on Federated Learning

Kai Li, Xin Yuan, Jingjing Zheng et al.

This paper puts forth a new training data-untethered model poisoning (MP) attack on federated learning (FL). The new MP attack extends an adversarial variational graph autoencoder (VGAE) to create malicious local models based solely on the benign local models overheard without any access to the training data of FL. Such an advancement leads to the VGAE-MP attack that is not only efficacious but also remains elusive to detection. VGAE-MP attack extracts graph structural correlations among the benign local models and the training data features, adversarially regenerates the graph structure, and generates malicious local models using the adversarial graph structure and benign models' features. Moreover, a new attacking algorithm is presented to train the malicious local models using VGAE and sub-gradient descent, while enabling an optimal selection of the benign local models for training the VGAE. Experiments demonstrate a gradual drop in FL accuracy under the proposed VGAE-MP attack and the ineffectiveness of existing defense mechanisms in detecting the attack, posing a severe threat to FL.

CVApr 15, 2024
No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

Yu-Ju Tsai, Jin-Cheng Jhang, Jingjing Zheng et al.

Inherent ambiguity in layout annotations poses significant challenges to developing accurate 360° room layout estimation models. To address this issue, we propose a novel Bi-Layout model capable of predicting two distinct layout types. One stops at ambiguous regions, while the other extends to encompass all visible areas. Our model employs two global context embeddings, where each embedding is designed to capture specific contextual information for each layout type. With our novel feature guidance module, the image feature retrieves relevant context from these embeddings, generating layout-aware features for precise bi-layout predictions. A unique property of our Bi-Layout model is its ability to inherently detect ambiguous regions by comparing the two predictions. To circumvent the need for manual correction of ambiguous annotations during testing, we also introduce a new metric for disambiguating ground truth layouts. Our method demonstrates superior performance on benchmark datasets, notably outperforming leading approaches. Specifically, on the MatterportLayout dataset, it improves 3DIoU from 81.70% to 82.57% across the full test set and notably from 54.80% to 59.97% in subsets with significant ambiguity. Project page: https://liagm.github.io/Bi_Layout/

37.0LGMar 10
On Catastrophic Forgetting in Low-Rank Decomposition-Based Parameter-Efficient Fine-Tuning

Muhammad Ahmad, Jingjing Zheng, Yankai Cao

Parameter-efficient fine-tuning (PEFT) based on low-rank decomposition, such as LoRA, has become a standard for adapting large pretrained models. However, its behavior in sequential learning -- specifically regarding catastrophic forgetting -- remains insufficiently understood. In this work, we present an empirical study showing that forgetting is strongly influenced by the geometry and parameterization of the update subspace. While methods that restrict updates to small, shared matrix subspaces often suffer from task interference, tensor-based decompositions (e.g., LoRETTA) mitigate forgetting by capturing richer structural information within ultra-compact budgets, and structurally aligned parameterizations (e.g., WeGeFT) preserve pretrained representations. Our findings highlight update subspace design as a key factor in continual learning and offer practical guidance for selecting efficient adaptation strategies in sequential settings.

CVMar 6, 2025
TextDoctor: Unified Document Image Inpainting via Patch Pyramid Diffusion Models

Wanglong Lu, Lingming Su, Jingjing Zheng et al.

Digital versions of real-world text documents often suffer from issues like environmental corrosion of the original document, low-quality scanning, or human interference. Existing document restoration and inpainting methods typically struggle with generalizing to unseen document styles and handling high-resolution images. To address these challenges, we introduce TextDoctor, a novel unified document image inpainting method. Inspired by human reading behavior, TextDoctor restores fundamental text elements from patches and then applies diffusion models to entire document images instead of training models on specific document types. To handle varying text sizes and avoid out-of-memory issues, common in high-resolution documents, we propose using structure pyramid prediction and patch pyramid diffusion models. These techniques leverage multiscale inputs and pyramid patches to enhance the quality of inpainting both globally and locally. Extensive qualitative and quantitative experiments on seven public datasets validated that TextDoctor outperforms state-of-the-art methods in restoring various types of high-resolution document images.

AIOct 7, 2025
Joint Communication Scheduling and Velocity Control for Multi-UAV-Assisted Post-Disaster Monitoring: An Attention-Based In-Context Learning Approach

Yousef Emami, Seyedsina Nabavirazavi, Jingjing Zheng et al.

Recently, Unmanned Aerial Vehicles (UAVs) are increasingly being investigated to collect sensory data in post-disaster monitoring scenarios, such as tsunamis, where early actions are critical to limit coastal damage. A major challenge is to design the data collection schedules and flight velocities, as unfavorable schedules and velocities can lead to transmission errors and buffer overflows of the ground sensors, ultimately resulting in significant packet loss. Meanwhile, online Deep Reinforcement Learning (DRL) solutions have a complex training process and a mismatch between simulation and reality that does not meet the urgent requirements of tsunami monitoring. Recent advances in Large Language Models (LLMs) offer a compelling alternative. With their strong reasoning and generalization capabilities, LLMs can adapt to new tasks through In-Context Learning (ICL), which enables task adaptation through natural language prompts and example-based guidance without retraining. However, LLM models have input data limitations and thus require customized approaches. In this paper, a joint optimization of data collection schedules and velocities control for multiple UAVs is proposed to minimize data loss. The battery level of the ground sensors, the length of the queues, and the channel conditions, as well as the trajectories of the UAVs, are taken into account. Attention-Based In-Context Learning for Velocity Control and Data Collection Schedule (AIC-VDS) is proposed as an alternative to DRL in emergencies. The simulation results show that the proposed AIC-VDS outperforms both the Deep-Q-Network (DQN) and maximum channel gain baselines.

CVJul 30, 2025
Details Matter for Indoor Open-vocabulary 3D Instance Segmentation

Sanghun Jung, Jingjing Zheng, Ke Zhang et al.

Unlike closed-vocabulary 3D instance segmentation that is often trained end-to-end, open-vocabulary 3D instance segmentation (OV-3DIS) often leverages vision-language models (VLMs) to generate 3D instance proposals and classify them. While various concepts have been proposed from existing research, we observe that these individual concepts are not mutually exclusive but complementary. In this paper, we propose a new state-of-the-art solution for OV-3DIS by carefully designing a recipe to combine the concepts together and refining them to address key challenges. Our solution follows the two-stage scheme: 3D proposal generation and instance classification. We employ robust 3D tracking-based proposal aggregation to generate 3D proposals and remove overlapped or partial proposals by iterative merging/removal. For the classification stage, we replace the standard CLIP model with Alpha-CLIP, which incorporates object masks as an alpha channel to reduce background noise and obtain object-centric representation. Additionally, we introduce the standardized maximum similarity (SMS) score to normalize text-to-proposal similarity, effectively filtering out false positives and boosting precision. Our framework achieves state-of-the-art performance on ScanNet200 and S3DIS across all AP and AR metrics, even surpassing an end-to-end closed-vocabulary method.

LGMay 19, 2023
A Novel Tensor Factorization-Based Method with Robustness to Inaccurate Rank Estimation

Jingjing Zheng, Wenzhe Wang, Xiaoqin Zhang et al.

This study aims to solve the over-reliance on the rank estimation strategy in the standard tensor factorization-based tensor recovery and the problem of a large computational cost in the standard t-SVD-based tensor recovery. To this end, we proposes a new tensor norm with a dual low-rank constraint, which utilizes the low-rank prior and rank information at the same time. In the proposed tensor norm, a series of surrogate functions of the tensor tubal rank can be used to achieve better performance in harness low-rankness within tensor data. It is proven theoretically that the resulting tensor completion model can effectively avoid performance degradation caused by inaccurate rank estimation. Meanwhile, attributed to the proposed dual low-rank constraint, the t-SVD of a smaller tensor instead of the original big one is computed by using a sample trick. Based on this, the total cost at each iteration of the optimization algorithm is reduced to $\mathcal{O}(n^3\log n +kn^3)$ from $\mathcal{O}(n^4)$ achieved with standard methods, where $k$ is the estimation of the true tensor rank and far less than $n$. Our method was evaluated on synthetic and real-world data, and it demonstrated superior performance and efficiency over several existing state-of-the-art tensor completion methods.

LGFeb 15, 2022
Exploring Deep Reinforcement Learning-Assisted Federated Learning for Online Resource Allocation in Privacy-Persevering EdgeIoT

Jingjing Zheng, Kai Li, Naram Mhaisen et al.

Federated learning (FL) has been increasingly considered to preserve data training privacy from eavesdropping attacks in mobile edge computing-based Internet of Thing (EdgeIoT). On the one hand, the learning accuracy of FL can be improved by selecting the IoT devices with large datasets for training, which gives rise to a higher energy consumption. On the other hand, the energy consumption can be reduced by selecting the IoT devices with small datasets for FL, resulting in a falling learning accuracy. In this paper, we formulate a new resource allocation problem for privacy-persevering EdgeIoT to balance the learning accuracy of FL and the energy consumption of the IoT device. We propose a new federated learning-enabled twin-delayed deep deterministic policy gradient (FL-DLT3) framework to achieve the optimal accuracy and energy balance in a continuous domain. Furthermore, long short term memory (LSTM) is leveraged in FL-DLT3 to predict the time-varying network state while FL-DLT3 is trained to select the IoT devices and allocate the transmit power. Numerical results demonstrate that the proposed FL-DLT3 achieves fast convergence (less than 100 iterations) while the FL accuracy-to-energy consumption ratio is improved by 51.8% compared to existing state-of-the-art benchmark.

CVApr 12, 2018
Cross-Domain Visual Recognition via Domain Adaptive Dictionary Learning

Hongyu Xu, Jingjing Zheng, Azadeh Alavi et al.

In real-world visual recognition problems, the assumption that the training data (source domain) and test data (target domain) are sampled from the same distribution is often violated. This is known as the domain adaptation problem. In this work, we propose a novel domain-adaptive dictionary learning framework for cross-domain visual recognition. Our method generates a set of intermediate domains. These intermediate domains form a smooth path and bridge the gap between the source and target domains. Specifically, we not only learn a common dictionary to encode the domain-shared features, but also learn a set of domain-specific dictionaries to model the domain shift. The separation of the common and domain-specific dictionaries enables us to learn more compact and reconstructive dictionaries for domain adaptation. These dictionaries are learned by alternating between domain-adaptive sparse coding and dictionary updating steps. Meanwhile, our approach gradually recovers the feature representations of both source and target data along the domain path. By aligning all the recovered domain data, we derive the final domain-adaptive features for cross-domain visual recognition. Extensive experiments on three public datasets demonstrates that our approach outperforms most state-of-the-art methods.