CVAug 4, 2023Code
Universal Defensive Underpainting Patch: Making Your Text Invisible to Optical Character RecognitionJiaCheng Deng, Li Dong, Jiahao Chen et al.
Optical Character Recognition (OCR) enables automatic text extraction from scanned or digitized text images, but it also makes it easy to pirate valuable or sensitive text from these images. Previous methods to prevent OCR piracy by distorting characters in text images are impractical in real-world scenarios, as pirates can capture arbitrary portions of the text images, rendering the defenses ineffective. In this work, we propose a novel and effective defense mechanism termed the Universal Defensive Underpainting Patch (UDUP) that modifies the underpainting of text images instead of the characters. UDUP is created through an iterative optimization process to craft a small, fixed-size defensive patch that can generate non-overlapping underpainting for text images of any size. Experimental results show that UDUP effectively defends against unauthorized OCR under the setting of any screenshot range or complex image background. It is agnostic to the content, size, colors, and languages of characters, and is robust to typical image operations such as scaling and compressing. In addition, the transferability of UDUP is demonstrated by evading several off-the-shelf OCRs. The code is available at https://github.com/QRICKDD/UDUP.
CVSep 23, 2024Code
Revisiting Video Quality Assessment from the Perspective of GeneralizationXinli Yue, Jianhui Sun, Liangchao Yao et al.
The increasing popularity of short video platforms such as YouTube Shorts, TikTok, and Kwai has led to a surge in User-Generated Content (UGC), which presents significant challenges for the generalization performance of Video Quality Assessment (VQA) tasks. These challenges not only affect performance on test sets but also impact the ability to generalize across different datasets. While prior research has primarily focused on enhancing feature extractors, sampling methods, and network branches, it has largely overlooked the generalization capabilities of VQA tasks. In this work, we reevaluate the VQA task from a generalization standpoint. We begin by analyzing the weight loss landscape of VQA models, identifying a strong correlation between this landscape and the generalization gaps. We then investigate various techniques to regularize the weight loss landscape. Our results reveal that adversarial weight perturbations can effectively smooth this landscape, significantly improving the generalization performance, with cross-dataset generalization and fine-tuning performance enhanced by up to 1.8% and 3%, respectively. Through extensive experiments across various VQA methods and datasets, we validate the effectiveness of our approach. Furthermore, by leveraging our insights, we achieve state-of-the-art performance in Image Quality Assessment (IQA) tasks. Our code is available at https://github.com/XinliYue/VQA-Generalization.
CVMar 15, 2024Code
Revisiting Adversarial Training under Long-Tailed DistributionsXinli Yue, Ningping Mou, Qian Wang et al.
Deep neural networks are vulnerable to adversarial attacks, often leading to erroneous outputs. Adversarial training has been recognized as one of the most effective methods to counter such attacks. However, existing adversarial training techniques have predominantly been tested on balanced datasets, whereas real-world data often exhibit a long-tailed distribution, casting doubt on the efficacy of these methods in practical scenarios. In this paper, we delve into adversarial training under long-tailed distributions. Through an analysis of the previous work "RoBal", we discover that utilizing Balanced Softmax Loss alone can achieve performance comparable to the complete RoBal approach while significantly reducing training overheads. Additionally, we reveal that, similar to uniform distributions, adversarial training under long-tailed distributions also suffers from robust overfitting. To address this, we explore data augmentation as a solution and unexpectedly discover that, unlike results obtained with balanced data, data augmentation not only effectively alleviates robust overfitting but also significantly improves robustness. We further investigate the reasons behind the improvement of robustness through data augmentation and identify that it is attributable to the increased diversity of examples. Extensive experiments further corroborate that data augmentation alone can significantly improve robustness. Finally, building on these findings, we demonstrate that compared to RoBal, the combination of BSL and data augmentation leads to a +6.66% improvement in model robustness under AutoAttack on CIFAR-10-LT. Our code is available at https://github.com/NISPLab/AT-BSL .
CVSep 23, 2024
Advancing Video Quality Assessment for AIGCXinli Yue, Jianhui Sun, Han Kong et al.
In recent years, AI generative models have made remarkable progress across various domains, including text generation, image generation, and video generation. However, assessing the quality of text-to-video generation is still in its infancy, and existing evaluation frameworks fall short when compared to those for natural videos. Current video quality assessment (VQA) methods primarily focus on evaluating the overall quality of natural videos and fail to adequately account for the substantial quality discrepancies between frames in generated videos. To address this issue, we propose a novel loss function that combines mean absolute error with cross-entropy loss to mitigate inter-frame quality inconsistencies. Additionally, we introduce the innovative S2CNet technique to retain critical content, while leveraging adversarial training to enhance the model's generalization capabilities. Experimental results demonstrate that our method outperforms existing VQA techniques on the AIGC Video dataset, surpassing the previous state-of-the-art by 3.1% in terms of PLCC.
CLDec 17, 2024
Solid-SQL: Enhanced Schema-linking based In-context Learning for Robust Text-to-SQLGeling Liu, Yunzhi Tan, Ruichao Zhong et al.
Recently, large language models (LLMs) have significantly improved the performance of text-to-SQL systems. Nevertheless, many state-of-the-art (SOTA) approaches have overlooked the critical aspect of system robustness. Our experiments reveal that while LLM-driven methods excel on standard datasets, their accuracy is notably compromised when faced with adversarial perturbations. To address this challenge, we propose a robust text-to-SQL solution, called Solid-SQL, designed to integrate with various LLMs. We focus on the pre-processing stage, training a robust schema-linking model enhanced by LLM-based data augmentation. Additionally, we design a two-round, structural similarity-based example retrieval strategy for in-context learning. Our method achieves SOTA SQL execution accuracy levels of 82.1% and 58.9% on the general Spider and Bird benchmarks, respectively. Furthermore, experimental results show that Solid-SQL delivers an average improvement of 11.6% compared to baselines on the perturbed Spider-Syn, Spider-Realistic, and Dr. Spider benchmarks.
CROct 29, 2019
Shielding Collaborative Learning: Mitigating Poisoning Attacks through Client-Side DetectionLingchen Zhao, Shengshan Hu, Qian Wang et al.
Collaborative learning allows multiple clients to train a joint model without sharing their data with each other. Each client performs training locally and then submits the model updates to a central server for aggregation. Since the server has no visibility into the process of generating the updates, collaborative learning is vulnerable to poisoning attacks where a malicious client can generate a poisoned update to introduce backdoor functionality to the joint model. The existing solutions for detecting poisoned updates, however, fail to defend against the recently proposed attacks, especially in the non-IID setting. In this paper, we present a novel defense scheme to detect anomalous updates in both IID and non-IID settings. Our key idea is to realize client-side cross-validation, where each update is evaluated over other clients' local data. The server will adjust the weights of the updates based on the evaluation results when performing aggregation. To adapt to the unbalanced distribution of data in the non-IID setting, a dynamic client allocation mechanism is designed to assign detection tasks to the most suitable clients. During the detection process, we also protect the client-level privacy to prevent malicious clients from stealing the training data of other clients, by integrating differential privacy with our design without degrading the detection performance. Our experimental evaluations on two real-world datasets show that our scheme is significantly robust to two representative poisoning attacks.
CRSep 16, 2019
VeriML: Enabling Integrity Assurances and Fair Payments for Machine Learning as a ServiceLingchen Zhao, Qian Wang, Cong Wang et al.
Machine Learning as a Service (MLaaS) allows clients with limited resources to outsource their expensive ML tasks to powerful servers. Despite the huge benefits, current MLaaS solutions still lack strong assurances on: 1) service correctness (i.e., whether the MLaaS works as expected); 2) trustworthy accounting (i.e., whether the bill for the MLaaS resource consumption is correctly accounted); 3) fair payment (i.e., whether a client gets the entire MLaaS result before making the payment). Without these assurances, unfaithful service providers can return improperly-executed ML task results or partially trained ML models while asking for over-claimed rewards. Moreover, it is hard to argue for wide adoption of MLaaS to both the client and the service provider, especially in the open market without a trusted third party. In this paper, we present VeriML, a novel and efficient framework to bring integrity assurances and fair payments to MLaaS. With VeriML, clients can be assured that ML tasks are correctly executed on an untrusted server and the resource consumption claimed by the service provider equals to the actual workload. We strategically use succinct non-interactive arguments of knowledge (SNARK) on randomly-selected iterations during the ML training phase for efficiency with tunable probabilistic assurance. We also develop multiple ML-specific optimizations to the arithmetic circuit required by SNARK. Our system implements six common algorithms: linear regression, logistic regression, neural network, support vector machine, Kmeans and decision tree. The experimental results have validated the practical performance of VeriML.
CRDec 25, 2018
Privacy-Preserving Collaborative Deep Learning with Unreliable ParticipantsLingchen Zhao, Qian Wang, Qin Zou et al.
With powerful parallel computing GPUs and massive user data, neural-network-based deep learning can well exert its strong power in problem modeling and solving, and has archived great success in many applications such as image classification, speech recognition and machine translation etc. While deep learning has been increasingly popular, the problem of privacy leakage becomes more and more urgent. Given the fact that the training data may contain highly sensitive information, e.g., personal medical records, directly sharing them among the users (i.e., participants) or centrally storing them in one single location may pose a considerable threat to user privacy. In this paper, we present a practical privacy-preserving collaborative deep learning system that allows users to cooperatively build a collective deep learning model with data of all participants, without direct data sharing and central data storage. In our system, each participant trains a local model with their own data and only shares model parameters with the others. To further avoid potential privacy leakage from sharing model parameters, we use functional mechanism to perturb the objective function of the neural network in the training process to achieve $ε$-differential privacy. In particular, for the first time, we consider the existence of~\textit{unreliable participants}, i.e., the participants with low-quality data, and propose a solution to reduce the impact of these participants while protecting their privacy. We evaluate the performance of our system on two well-known real-world datasets for regression and classification tasks. The results demonstrate that the proposed system is robust against unreliable participants, and achieves high accuracy close to the model trained in a traditional centralized manner while ensuring rigorous privacy protection.