Qinglong Wang

LG
h-index19
21papers
585citations
Novelty56%
AI Score52

21 Papers

CVJun 20, 2023Code
Masked Diffusion Models Are Fast Distribution Learners

Jiachen Lei, Qinglong Wang, Peng Cheng et al.

Diffusion model has emerged as the \emph{de-facto} model for image generation, yet the heavy training overhead hinders its broader adoption in the research community. We observe that diffusion models are commonly trained to learn all fine-grained visual information from scratch. This paradigm may cause unnecessary training costs hence requiring in-depth investigation. In this work, we show that it suffices to train a strong diffusion model by first pre-training the model to learn some primer distribution that loosely characterizes the unknown real image distribution. Then the pre-trained model can be fine-tuned for various generation tasks efficiently. In the pre-training stage, we propose to mask a high proportion (e.g., up to 90\%) of input images to approximately represent the primer distribution and introduce a masked denoising score matching objective to train a model to denoise visible areas. In subsequent fine-tuning stage, we efficiently train diffusion model without masking. Utilizing the two-stage training framework, we achieves significant training acceleration and a new FID score record of 6.27 on CelebA-HQ $256 \times 256$ for ViT-based diffusion models. The generalizability of a pre-trained model further helps building models that perform better than ones trained from scratch on different downstream datasets. For instance, a diffusion model pre-trained on VGGFace2 attains a 46\% quality improvement when fine-tuned on a different dataset that contains only 3000 images. Our code is available at \url{https://github.com/jiachenlei/maskdm}.

CVSep 25, 2023
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution

Zhongjie Ba, Jieming Zhong, Jiachen Lei et al.

Advanced text-to-image models such as DALL$\cdot$E 2 and Midjourney possess the capacity to generate highly realistic images, raising significant concerns regarding the potential proliferation of unsafe content. This includes adult, violent, or deceptive imagery of political figures. Despite claims of rigorous safety mechanisms implemented in these models to restrict the generation of not-safe-for-work (NSFW) content, we successfully devise and exhibit the first prompt attacks on Midjourney, resulting in the production of abundant photorealistic NSFW images. We reveal the fundamental principles of such prompt attacks and suggest strategically substituting high-risk sections within a suspect prompt to evade closed-source safety measures. Our novel framework, SurrogatePrompt, systematically generates attack prompts, utilizing large language models, image-to-text, and image-to-image modules to automate attack prompt creation at scale. Evaluation results disclose an 88% success rate in bypassing Midjourney's proprietary safety filter with our attack prompts, leading to the generation of counterfeit images depicting political figures in violent scenarios. Both subjective and objective assessments validate that the images generated from our attack prompts present considerable safety hazards.

SYFeb 18, 2018
When Renewable Energy Meets Building Thermal Mass: A Real-time Load Management Scheme

Yan Shen, Zhonghao Sun, Qinglong Wang

We consider the optimal power management in renewable driven smart building MicroGrid under noise corrupted conditions as a stochastic optimization problem. We first propose our user satisfaction and electricity consumption balanced (USECB) profit model as the objective for optimal power management. We then cast the problem in noise corrupted conditions into the class of expectation maximizing in stochastic optimization problem with convex constraints. For this task, we design a Bregemen projection based mirror decent algorithm as an approximation solution to our stochastic optimization problem. Convergence and upper-bound of our algorithm with proof are also provided in our paper. We then conduct a broad type of experiment in our simulation to test the justification of our model as well as the effectiveness of our algorithm.

SDMay 15
Beyond Content: A Comprehensive Speech Toxicity Dataset and Detection Framework Incorporating Paralinguistic Cues

Zhongjie Ba, Liang Yi, Peng Cheng et al.

Toxic speech detection has become a crucial challenge in maintaining safe online communication environments. However, existing approaches to toxic speech detection often neglect the contribution of paralinguistic cues, such as emotion, intonation, and speech rate, which are key to detecting speech toxicity. Moreover, current toxic speech datasets are predominantly text-based, limiting the development of models that can capture paralinguistic cues.To address these challenges, we present ToxiAlert-Bench, a large-scale audio dataset comprising over 30,000 audio clips annotated with seven major toxic categories and twenty fine-grained toxic labels. Uniquely, our dataset annotates toxicity sources -- distinguishing between textual content and paralinguistic origins -- for comprehensive toxic speech analysis.Furthermore, we propose a dual-head neural network with a multi-stage training strategy tailored for toxic speech detection. This architecture features two task-specific classification headers: one for identifying the source of sensitivity (textual or paralinguistic), and the other for categorizing the specific toxic type. The training process involves independent head training followed by joint fine-tuning to reduce task interference. To mitigate data class imbalance, we incorporate class-balanced sampling and weighted loss functions.Our experimental results show that leveraging paralinguistic features significantly improves detection performance. Our method consistently outperforms existing baselines across multiple evaluation metrics, with a 21.1% relative improvement in Macro-F1 score and a 13.0% relative gain in accuracy over the strongest baseline, highlighting its enhanced effectiveness and practical applicability.

AIJan 28
Policy of Thoughts: Scaling LLM Reasoning via Test-time Policy Evolution

Zhengbo Jiao, Hongyu Xian, Qinglong Wang et al.

Large language models (LLMs) struggle with complex, long-horizon reasoning due to instability caused by their frozen policy assumption. Current test-time scaling methods treat execution feedback merely as an external signal for filtering or rewriting trajectories, without internalizing it to improve the underlying reasoning strategy. Inspired by Popper's epistemology of "conjectures and refutations," we argue that intelligence requires real-time evolution of the model's policy through learning from failed attempts. We introduce Policy of Thoughts (PoT), a framework that recasts reasoning as a within-instance online optimization process. PoT first generates diverse candidate solutions via an efficient exploration mechanism, then uses Group Relative Policy Optimization (GRPO) to update a transient LoRA adapter based on execution feedback. This closed-loop design enables dynamic, instance-specific refinement of the model's reasoning priors. Experiments show that PoT dramatically boosts performance: a 4B model achieves 49.71% accuracy on LiveCodeBench, outperforming GPT-4o and DeepSeek-V3 despite being over 50 smaller.

CRMar 28, 2025Code
WMCopier: Forging Invisible Image Watermarks on Arbitrary Images

Ziping Dong, Chao Shuai, Zhongjie Ba et al.

Invisible Image Watermarking is crucial for ensuring content provenance and accountability in generative AI. While Gen-AI providers are increasingly integrating invisible watermarking systems, the robustness of these schemes against forgery attacks remains poorly characterized. This is critical, as forging traceable watermarks onto illicit content leads to false attribution, potentially harming the reputation and legal standing of Gen-AI service providers who are not responsible for the content. In this work, we propose WMCopier, an effective watermark forgery attack that operates without requiring any prior knowledge of or access to the target watermarking algorithm. Our approach first models the target watermark distribution using an unconditional diffusion model, and then seamlessly embeds the target watermark into a non-watermarked image via a shallow inversion process. We also incorporate an iterative optimization procedure that refines the reconstructed image to further trade off the fidelity and forgery efficiency. Experimental results demonstrate that WMCopier effectively deceives both open-source and closed-source watermark systems (e.g., Amazon's system), achieving a significantly higher success rate than existing methods. Additionally, we evaluate the robustness of forged samples and discuss the potential defenses against our attack.

LGApr 29
NeuroPlastic: A Plasticity-Modulated Optimizer for Biologically Inspired Learning Dynamics

Douglas Jiang, Yuechen Wang, Jiayi Wang et al.

Optimization algorithms are fundamental to modern deep learning, yet most widely used methods rely on update rules based primarily on local gradient statistics. We introduce NeuroPlastic, a plasticity-modulated optimizer that augments gradient-based updates with an adaptive multi-signal modulation mechanism inspired by multi-factor synaptic plasticity, a concept from neurobiology. NeuroPlastic dynamically scales gradient updates using interacting components that capture gradient, activity-like, and memory-like statistics, forming a lightweight modulation layer compatible with standard deep learning training pipelines. Across image classification benchmarks, NeuroPlastic consistently improves over a controlled gradient-only ablation, with more pronounced gains on the Fashion-MNIST benchmark and in reduced-data regimes. In transfer experiments on CIFAR-10 with ResNet-18, the method remains stable and competitive without retuning. These results suggest that multi-signal plasticity-inspired modulation can provide a useful extension to conventional gradient-driven optimization, particularly when learning signals are limited or noisy, and offer a promising direction for gradient-based methods in deep learning.

CVOct 31, 2024
Reverse Attitude Statistics Based Star Map Identification Method

Shunmei Dong, Qinglong Wang, Haiqing Wang et al.

The star tracker is generally affected by the atmospheric background light and the aerodynamic environment when working in near space, which results in missing stars or false stars. Moreover, high-speed maneuvering may cause star trailing, which reduces the accuracy of the star position. To address the challenges for starmap identification, a reverse attitude statistics based method is proposed to handle position noise, false stars, and missing stars. Conversely to existing methods which match before solving for attitude, this method introduces attitude solving into the matching process, and obtains the final match and the correct attitude simultaneously by frequency statistics. Firstly, based on stable angular distance features, the initial matching is obtained by utilizing spatial hash indexing. Then, the dual-vector attitude determination is introduced to calculate potential attitude. Finally, the star pairs are accurately matched by applying a frequency statistics filtering method. In addition, Bayesian optimization is employed to find optimal parameters under the impact of noises, which is able to enhance the algorithm performance further. In this work, the proposed method is validated in simulation, field test and on-orbit experiment. Compared with the state-of-the-art, the identification rate is improved by more than 14.3%, and the solving time is reduced by over 28.5%.

CVFeb 10, 2025
Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation

Zhongjie Ba, Yitao Zhang, Peng Cheng et al.

Watermarking plays a key role in the provenance and detection of AI-generated content. While existing methods prioritize robustness against real-world distortions (e.g., JPEG compression and noise addition), we reveal a fundamental tradeoff: such robust watermarks inherently improve the redundancy of detectable patterns encoded into images, creating exploitable information leakage. To leverage this, we propose an attack framework that extracts leakage of watermark patterns through multi-channel feature learning using a pre-trained vision model. Unlike prior works requiring massive data or detector access, our method achieves both forgery and detection evasion with a single watermarked image. Extensive experiments demonstrate that our method achieves a 60\% success rate gain in detection evasion and 51\% improvement in forgery accuracy compared to state-of-the-art methods while maintaining visual fidelity. Our work exposes the robustness-stealthiness paradox: current "robust" watermarks sacrifice security for distortion resistance, providing insights for future watermark design.

LGNov 12, 2019
Connecting First and Second Order Recurrent Networks with Deterministic Finite Automata

Qinglong Wang, Kaixuan Zhang, Xue Liu et al.

We propose an approach that connects recurrent networks with different orders of hidden interaction with regular grammars of different levels of complexity. We argue that the correspondence between recurrent networks and formal computational models gives understanding to the analysis of the complicated behaviors of recurrent networks. We introduce an entropy value that categorizes all regular grammars into three classes with different levels of complexity, and show that several existing recurrent networks match grammars from either all or partial classes. As such, the differences between regular grammars reveal the different properties of these models. We also provide a unification of all investigated recurrent networks. Our evaluation shows that the unified recurrent network has improved performance in learning grammars, and demonstrates comparable performance on a real-world dataset with more complicated models.

LGOct 15, 2019
Shapley Homology: Topological Analysis of Sample Influence for Neural Networks

Kaixuan Zhang, Qinglong Wang, Xue Liu et al.

Data samples collected for training machine learning models are typically assumed to be independent and identically distributed (iid). Recent research has demonstrated that this assumption can be problematic as it simplifies the manifold of structured data. This has motivated different research areas such as data poisoning, model improvement, and explanation of machine learning models. In this work, we study the influence of a sample on determining the intrinsic topological features of its underlying manifold. We propose the Shapley Homology framework, which provides a quantitative metric for the influence of a sample of the homology of a simplicial complex. By interpreting the influence as a probability measure, we further define an entropy which reflects the complexity of the data manifold. Our empirical studies show that when using the 0-dimensional homology, on neighboring graphs, samples with higher influence scores have more impact on the accuracy of neural networks for determining the graph connectivity and on several regular grammars whose higher entropy values imply more difficulty in being learned.

IRDec 7, 2018
Gated Attentive-Autoencoder for Content-Aware Recommendation

Chen Ma, Peng Kang, Bin Wu et al.

The rapid growth of Internet services and mobile devices provides an excellent opportunity to satisfy the strong demand for the personalized item or product recommendation. However, with the tremendous increase of users and items, personalized recommender systems still face several challenging problems: (1) the hardness of exploiting sparse implicit feedback; (2) the difficulty of combining heterogeneous data. To cope with these challenges, we propose a gated attentive-autoencoder (GATE) model, which is capable of learning fused hidden representations of items' contents and binary ratings, through a neural gating structure. Based on the fused representations, our model exploits neighboring relations between items to help infer users' preferences. In particular, a word-level and a neighbor-level attention module are integrated with the autoencoder. The word-level attention learns the item hidden representations from items' word sequences, while favoring informative words by assigning larger attention weights. The neighbor-level attention learns the hidden representation of an item's neighborhood by considering its neighbors in a weighted manner. We extensively evaluate our model with several state-of-the-art methods and different validation metrics on four real-world datasets. The experimental results not only demonstrate the effectiveness of our model on top-N recommendation but also provide interpretable results attributed to the attention modules.

LGNov 14, 2018
Verification of Recurrent Neural Networks Through Rule Extraction

Qinglong Wang, Kaixuan Zhang, Xue Liu et al.

The verification problem for neural networks is verifying whether a neural network will suffer from adversarial samples, or approximating the maximal allowed scale of adversarial perturbation that can be endured. While most prior work contributes to verifying feed-forward networks, little has been explored for verifying recurrent networks. This is due to the existence of a more rigorous constraint on the perturbation space for sequential data, and the lack of a proper metric for measuring the perturbation. In this work, we address these challenges by proposing a metric which measures the distance between strings, and use deterministic finite automata (DFA) to represent a rigorous oracle which examines if the generated adversarial samples violate certain constraints on a perturbation. More specifically, we empirically show that certain recurrent networks allow relatively stable DFA extraction. As such, DFAs extracted from these recurrent networks can serve as a surrogate oracle for when the ground truth DFA is unknown. We apply our verification mechanism to several widely used recurrent networks on a set of the Tomita grammars. The results demonstrate that only a few models remain robust against adversarial samples. In addition, we show that for grammars with different levels of complexity, there is also a difference in the difficulty of robust learning of these grammars.

IRSep 27, 2018
Point-of-Interest Recommendation: Exploiting Self-Attentive Autoencoders with Neighbor-Aware Influence

Chen Ma, Yingxue Zhang, Qinglong Wang et al.

The rapid growth of Location-based Social Networks (LBSNs) provides a great opportunity to satisfy the strong demand for personalized Point-of-Interest (POI) recommendation services. However, with the tremendous increase of users and POIs, POI recommender systems still face several challenging problems: (1) the hardness of modeling non-linear user-POI interactions from implicit feedback; (2) the difficulty of incorporating context information such as POIs' geographical coordinates. To cope with these challenges, we propose a novel autoencoder-based model to learn the non-linear user-POI relations, namely \textit{SAE-NAD}, which consists of a self-attentive encoder (SAE) and a neighbor-aware decoder (NAD). In particular, unlike previous works equally treat users' checked-in POIs, our self-attentive encoder adaptively differentiates the user preference degrees in multiple aspects, by adopting a multi-dimensional attention mechanism. To incorporate the geographical context information, we propose a neighbor-aware decoder to make users' reachability higher on the similar and nearby neighbors of checked-in POIs, which is achieved by the inner product of POI embeddings together with the radial basis function (RBF) kernel. To evaluate the proposed model, we conduct extensive experiments on three real-world datasets with many state-of-the-art baseline methods and evaluation metrics. The experimental results demonstrate the effectiveness of our model.

LGFeb 14, 2018
Energy Spatio-Temporal Pattern Prediction for Electric Vehicle Networks

Qinglong Wang

Information about the spatio-temporal pattern of electricity energy carried by EVs, instead of EVs themselves, is crucial for EVs to establish more effective and intelligent interactions with the smart grid. In this paper, we propose a framework for predicting the amount of the electricity energy stored by a large number of EVs aggregated within different city-scale regions, based on spatio-temporal pattern of the electricity energy. The spatial pattern is modeled via using a neural network based spatial predictor, while the temporal pattern is captured via using a linear-chain conditional random field (CRF) based temporal predictor. Two predictors are fed with spatial and temporal features respectively, which are extracted based on real trajectories data recorded in Beijing. Furthermore, we combine both predictors to build the spatio-temporal predictor, by using an optimal combination coefficient which minimizes the normalized mean square error (NMSE) of the predictions. The prediction performance is evaluated based on extensive experiments covering both spatial and temporal predictions, and the improvement achieved by the combined spatio-temporal predictor. The experiment results show that the NMSE of the spatio-temporal predictor is maintained below 0.1 for all investigate regions of Beijing. We further visualize the prediction and discuss the potential benefits can be brought to smart grid scheduling and EV charging by utilizing the proposed framework.

LGJan 16, 2018
A Comparative Study of Rule Extraction for Recurrent Neural Networks

Qinglong Wang, Kaixuan Zhang, Alexander G. Ororbia et al.

Understanding recurrent networks through rule extraction has a long history. This has taken on new interests due to the need for interpreting or verifying neural networks. One basic form for representing stateful rules is deterministic finite automata (DFA). Previous research shows that extracting DFAs from trained second-order recurrent networks is not only possible but also relatively stable. Recently, several new types of recurrent networks with more complicated architectures have been introduced. These handle challenging learning tasks usually involving sequential data. However, it remains an open problem whether DFAs can be adequately extracted from these models. Specifically, it is not clear how DFA extraction will be affected when applied to different recurrent networks trained on data sets with different levels of complexity. Here, we investigate DFA extraction on several widely adopted recurrent networks that are trained to learn a set of seven regular Tomita grammars. We first formally analyze the complexity of Tomita grammars and categorize these grammars according to that complexity. Then we empirically evaluate different recurrent networks for their performance of DFA extraction on all Tomita grammars. Our experiments show that for most recurrent networks, their extraction performance decreases as the complexity of the underlying grammar increases. On grammars of lower complexity, most recurrent networks obtain desirable extraction performance. As for grammars with the highest level of complexity, while several complicated models fail with only certain recurrent networks having satisfactory extraction performance.

LGSep 29, 2017
An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks

Qinglong Wang, Kaixuan Zhang, Alexander G. Ororbia et al.

Rule extraction from black-box models is critical in domains that require model validation before implementation, as can be the case in credit scoring and medical diagnosis. Though already a challenging problem in statistical learning in general, the difficulty is even greater when highly non-linear, recursive models, such as recurrent neural networks (RNNs), are fit to data. Here, we study the extraction of rules from second-order recurrent neural networks trained to recognize the Tomita grammars. We show that production rules can be stably extracted from trained RNNs and that in certain cases the rules outperform the trained RNNs.

LGDec 5, 2016
Learning Adversary-Resistant Deep Neural Networks

Qinglong Wang, Wenbo Guo, Kaixuan Zhang et al.

Deep neural networks (DNNs) have proven to be quite effective in a vast array of machine learning tasks, with recent examples in cyber security and autonomous vehicles. Despite the superior performance of DNNs in these applications, it has been recently shown that these models are susceptible to a particular type of attack that exploits a fundamental flaw in their design. This attack consists of generating particular synthetic examples referred to as adversarial samples. These samples are constructed by slightly manipulating real data-points in order to "fool" the original DNN model, forcing it to mis-classify previously correctly classified samples with high confidence. Addressing this flaw in the model is essential if DNNs are to be used in critical applications such as those in cyber security. Previous work has provided various learning algorithms to enhance the robustness of DNN models, and they all fall into the tactic of "security through obscurity". This means security can be guaranteed only if one can obscure the learning algorithms from adversaries. Once the learning technique is disclosed, DNNs protected by these defense mechanisms are still susceptible to adversarial samples. In this work, we investigate this issue shared across previous research work and propose a generic approach to escalate a DNN's resistance to adversarial samples. More specifically, our approach integrates a data transformation module with a DNN, making it robust even if we reveal the underlying learning algorithm. To demonstrate the generality of our proposed approach and its potential for handling cyber security applications, we evaluate our method and several other existing solutions on datasets publicly available. Our results indicate that our approach typically provides superior classification performance and resistance in comparison with state-of-art solutions.

LGOct 6, 2016
Using Non-invertible Data Transformations to Build Adversarial-Robust Neural Networks

Qinglong Wang, Wenbo Guo, Alexander G. Ororbia et al.

Deep neural networks have proven to be quite effective in a wide variety of machine learning tasks, ranging from improved speech recognition systems to advancing the development of autonomous vehicles. However, despite their superior performance in many applications, these models have been recently shown to be susceptible to a particular type of attack possible through the generation of particular synthetic examples referred to as adversarial samples. These samples are constructed by manipulating real examples from the training data distribution in order to "fool" the original neural model, resulting in misclassification (with high confidence) of previously correctly classified samples. Addressing this weakness is of utmost importance if deep neural architectures are to be applied to critical applications, such as those in the domain of cybersecurity. In this paper, we present an analysis of this fundamental flaw lurking in all neural architectures to uncover limitations of previously proposed defense mechanisms. More importantly, we present a unifying framework for protecting deep neural models using a non-invertible data transformation--developing two adversary-resilient architectures utilizing both linear and nonlinear dimensionality reduction. Empirical results indicate that our framework provides better robustness compared to state-of-art solutions while having negligible degradation in accuracy.

LGOct 5, 2016
Adversary Resistant Deep Neural Networks with an Application to Malware Detection

Qinglong Wang, Wenbo Guo, Kaixuan Zhang et al.

Beyond its highly publicized victories in Go, there have been numerous successful applications of deep learning in information retrieval, computer vision and speech recognition. In cybersecurity, an increasing number of companies have become excited about the potential of deep learning, and have started to use it for various security incidents, the most popular being malware detection. These companies assert that deep learning (DL) could help turn the tide in the battle against malware infections. However, deep neural networks (DNNs) are vulnerable to adversarial samples, a flaw that plagues most if not all statistical learning models. Recent research has demonstrated that those with malicious intent can easily circumvent deep learning-powered malware detection by exploiting this flaw. In order to address this problem, previous work has developed various defense mechanisms that either augmenting training data or enhance model's complexity. However, after a thorough analysis of the fundamental flaw in DNNs, we discover that the effectiveness of current defenses is limited and, more importantly, cannot provide theoretical guarantees as to their robustness against adversarial sampled-based attacks. As such, we propose a new adversary resistant technique that obstructs attackers from constructing impactful adversarial samples by randomly nullifying features within samples. In this work, we evaluate our proposed technique against a real world dataset with 14,679 malware variants and 17,399 benign programs. We theoretically validate the robustness of our technique, and empirically show that our technique significantly boosts DNN robustness to adversarial samples while maintaining high accuracy in classification. To demonstrate the general applicability of our proposed method, we also conduct experiments using the MNIST and CIFAR-10 datasets, generally used in image recognition research.

SYJul 22, 2016
Smart Charging for Electric Vehicles: A Survey From the Algorithmic Perspective

Qinglong Wang, Xue Liu, Jian Du et al.

Smart interactions among the smart grid, aggregators and EVs can bring various benefits to all parties involved, e.g., improved reliability and safety for the smart gird, increased profits for the aggregators, as well as enhanced self benefit for EV customers. This survey focus on viewing this smart interactions from an algorithmic perspective. In particular, important dominating factors for coordinated charging from three different perspectives are studied, in terms of smart grid oriented, aggregator oriented and customer oriented smart charging. Firstly, for smart grid oriented EV charging, we summarize various formulations proposed for load flattening, frequency regulation and voltage regulation, then explore the nature and substantial similarity among them. Secondly, for aggregator oriented EV charging, we categorize the algorithmic approaches proposed by research works sharing this perspective as direct and indirect coordinated control, and investigate these approaches in detail. Thirdly, for customer oriented EV charging, based on a commonly shared objective of reducing charging cost, we generalize different formulations proposed by studied research works. Moreover, various uncertainty issues, e.g., EV fleet uncertainty, electricity price uncertainty, regulation demand uncertainty, etc., have been discussed according to the three perspectives classified. At last, we discuss challenging issues that are commonly confronted during modeling the smart interactions, and outline some future research topics in this exciting area.