Shuang Liu

LG
h-index117
41papers
4,241citations
Novelty48%
AI Score59

41 Papers

LGJun 2
HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models

Shuang Liu, Yuxuan Bo, Qiuyang Zhao et al.

Reward models are central to large language model (LLM) alignment, but they remain vulnerable to reward hacking. To evaluate reward-model robustness, we introduce RewardHackBench containing 13 reward-hacking patterns covering real life high-stakes domains and general settings, and we find severe failures on specific subcategories across eight reward models. To mitigate these failures, we propose HARVE, a training-free reward-head editing method for scalar reward models. Instead of fine-tuning the reward model, HARVE identifies a multi-directional hacking subspace from residual stream directions associated with selected hacking subcategories, and removes the component of the reward-head vector aligned with that subspace. This directly reduces the reward head's sensitivity to hacking-related features using only a small set of contrastive gold-hacked examples, without gradient updates or fine-tuning. Comprehensive experiments across eight reward models indicates that \model improves hacking robustness, outperforms fine-tuning baselines, and preserves reward-models' general capability. Further analyses suggest that reward hacking is better captured as a multidimensional residual-space structure than by isolated surface cues.

CLDec 19, 2022
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding

Haoli Bai, Zhiguang Liu, Xiaojun Meng et al.

Unsupervised pre-training on millions of digital-born or scanned documents has shown promising advances in visual document understanding~(VDU). While various vision-language pre-training objectives are studied in existing solutions, the document textline, as an intrinsic granularity in VDU, has seldom been explored so far. A document textline usually contains words that are spatially and semantically correlated, which can be easily obtained from OCR engines. In this paper, we propose Wukong-Reader, trained with new pre-training objectives to leverage the structural knowledge nested in document textlines. We introduce textline-region contrastive learning to achieve fine-grained alignment between the visual regions and texts of document textlines. Furthermore, masked region modeling and textline-grid matching are also designed to enhance the visual and layout representations of textlines. Experiments show that our Wukong-Reader has superior performance on various VDU tasks such as information extraction. The fine-grained alignment over textlines also empowers Wukong-Reader with promising localization ability.

SYMay 7
Realization of Precise Perforating Using Dynamic Threshold and Physical Plausibility Algorithm for Self-Locating Perforating in Oil and Gas Wells

Si-Yu Xiao, Guo-Hui Ren, Tian-Hao Mao et al.

Accurate depth measurement is critical for targeting designated perforation intervals to maximize hydrocarbon recovery. While next-generation automated wireless perforating techniques reduce reliance on costly surface infrastructure and personnel, they lack the continuous depth correlation provided by conventional wireline cables. Consequently, correlating real-time casing collar locator (CCL) signals with a pre-recorded casing tally is essential for automatic depth determination. However, implementing this measurement remains challenging: downhole instruments must process CCL signals in real-time to identify collar signatures from complex interference, a task severely restricted by the limited computational resources and power budget of high-temperature downhole electronics. To address these constraints, this work proposes the Dynamic Threshold and Physical Plausibility Depth Measurement and Perforation Control (DTPPMP) system. This integrated solution enables in situ depth calibration by correlating CCL signals with the casing tally using lightweight algorithms for dynamic-threshold-based collar recognition and physical plausibility verification. Field tests demonstrate a collar recognition F1 score of 98.6% at a throughput of 1000 Sa/s. Notably, the algorithm requires only 1.5 μs per sample, confirming its computational efficiency and suitability for deployment on resource-constrained, high-temperature downhole platforms.

LGJun 26, 2022
Improving Policy Optimization with Generalist-Specialist Learning

Zhiwei Jia, Xuanlin Li, Zhan Ling et al.

Generalization in deep reinforcement learning over unseen environment variations usually requires policy learning over a large set of diverse training variations. We empirically observe that an agent trained on many variations (a generalist) tends to learn faster at the beginning, yet its performance plateaus at a less optimal level for a long time. In contrast, an agent trained only on a few variations (a specialist) can often achieve high returns under a limited computational budget. To have the best of both worlds, we propose a novel generalist-specialist training framework. Specifically, we first train a generalist on all environment variations; when it fails to improve, we launch a large population of specialists with weights cloned from the generalist, each trained to master a selected small subset of variations. We finally resume the training of the generalist with auxiliary rewards induced by demonstrations of all specialists. In particular, we investigate the timing to start specialist training and compare strategies to learn generalists with assistance from specialists. We show that this framework pushes the envelope of policy learning on several challenging and popular benchmarks including Procgen, Meta-World and ManiSkill.

CVMar 17, 2022
Unified Line and Paragraph Detection by Graph Convolutional Networks

Shuang Liu, Renshen Wang, Michalis Raptis et al.

We formulate the task of detecting lines and paragraphs in a document into a unified two-level clustering problem. Given a set of text detection boxes that roughly correspond to words, a text line is a cluster of boxes and a paragraph is a cluster of lines. These clusters form a two-level tree that represents a major part of the layout of a document. We use a graph convolutional network to predict the relations between text detection boxes and then build both levels of clusters from these predictions. Experimentally, we demonstrate that the unified approach can be highly efficient while still achieving state-of-the-art quality for detecting paragraphs in public benchmarks and real-world images.

LGOct 30, 2025Code
Pelican-VL 1.0: A Foundation Brain Model for Embodied Intelligence

Yi Zhang, Che Liu, Xiancong Ren et al.

This report presents Pelican-VL 1.0, a new family of open-source embodied brain models with parameter scales ranging from 7 billion to 72 billion. Our explicit mission is clearly stated as: To embed powerful intelligence into various embodiments. Pelican-VL 1.0 is currently the largest-scale open-source embodied multimodal brain model. Its core advantage lies in the in-depth integration of data power and intelligent adaptive learning mechanisms. Specifically, metaloop distilled a high-quality dataset from a raw dataset containing 4+ billion tokens. Pelican-VL 1.0 is trained on a large-scale cluster of 1000+ A800 GPUs, consuming over 50k+ A800 GPU-hours per checkpoint. This translates to a 20.3% performance uplift from its base model and outperforms 100B-level open-source counterparts by 10.6%, placing it on par with leading proprietary systems on well-known embodied benchmarks. We establish a novel framework, DPPO (Deliberate Practice Policy Optimization), inspired by human metacognition to train Pelican-VL 1.0. We operationalize this as a metaloop that teaches the AI to practice deliberately, which is a RL-Refine-Diagnose-SFT loop.

LGApr 21, 2022
Provably Efficient Kernelized Q-Learning

Shuang Liu, Hao Su

We propose and analyze a kernelized version of Q-learning. Although a kernel space is typically infinite-dimensional, extensive study has shown that generalization is only affected by the effective dimension of the data. We incorporate such ideas into the Q-learning framework and derive regret bounds for arbitrary kernels. In particular, we provide concrete bounds for linear kernels and Gaussian RBF kernels; notably, the latter bound looks almost identical to the former, only that the actual dimension is replaced by a different notion of dimensionality. Finally, we test our algorithm on a suite of classic control tasks; remarkably, under the Gaussian RBF kernel, it achieves reasonably good performance after only 1000 environmental steps, while its neural network counterpart, deep Q-learning, still struggles.

AIMar 21, 2023
A Review on Machine Theory of Mind

Yuanyuan Mao, Shuang Liu, Pengshuai Zhao et al.

Theory of Mind (ToM) is the ability to attribute mental states to others, the basis of human cognition. At present, there has been growing interest in the AI with cognitive abilities, for example in healthcare and the motoring industry. Beliefs, desires, and intentions are the early abilities of infants and the foundation of human cognitive ability, as well as for machine with ToM. In this paper, we review recent progress in machine ToM on beliefs, desires, and intentions. And we shall introduce the experiments, datasets and methods of machine ToM on these three aspects, summarize the development of different tasks and datasets in recent years, and compare well-behaved models in aspects of advantages, limitations and applicable conditions, hoping that this study can guide researchers to quickly keep up with latest trend in this field. Unlike other domains with a specific task and resolution framework, machine ToM lacks a unified instruction and a series of standard evaluation tasks, which make it difficult to formally compare the proposed models. We argue that, one method to address this difficulty is now to present a standard assessment criteria and dataset, better a large-scale dataset covered multiple aspects of ToM.

NAMay 19
A second-order product-type implicit-explicit Runge-Kutta method preserving unit length and energy dissipation structures for gradient flows of vector fields

Jianan Li, Shuang Liu, Tao Tang et al.

Gradient flows of unit vector fields arise in a wide range of physical models such as harmonic map heat flows, nematic liquid crystals, and magnetization dynamics. Designing numerical schemes that simultaneously preserve the unit length constraint and dissipate energy is essential for reliable simulations of such systems. Although projection methods can effectively enforce the unit length constraint, ensuring energy dissipation under projection, especially in high-order schemes, remains challenging. Unlike traditional implicit-explicit Runge-Kutta (IMEX-RK) methods, in this work we propose a general methodology for constructing product-type IMEX-RK schemes that offers greater adaptability to various models with the goal of designing structure-preserving numerical schemes. For gradient flows of unit vector fields with Dirichlet energy, we design a linear and second-order numerical scheme that simultaneously preserves energy dissipation and the unit length constraint by using product-type IMEX-RK methods and projection techniques. Numerical experiments verify the accuracy, stability, and structure-preserving properties of the scheme. According to our best knowledge, this is the first second-order linear scheme that can preserve both the unit length and the original Dirichlet energy for harmonic map heat flows.

LGJan 31, 2024Code
Game-Theoretic Unlearnable Example Generator

Shuang Liu, Yihan Wang, Xiao-Shan Gao

Unlearnable example attacks are data poisoning attacks aiming to degrade the clean test accuracy of deep learning by adding imperceptible perturbations to the training samples, which can be formulated as a bi-level optimization problem. However, directly solving this optimization problem is intractable for deep neural networks. In this paper, we investigate unlearnable example attacks from a game-theoretic perspective, by formulating the attack as a nonzero sum Stackelberg game. First, the existence of game equilibria is proved under the normal setting and the adversarial training setting. It is shown that the game equilibrium gives the most powerful poison attack in that the victim has the lowest test accuracy among all networks within the same hypothesis space, when certain loss functions are used. Second, we propose a novel attack method, called the Game Unlearnable Example (GUE), which has three main gradients. (1) The poisons are obtained by directly solving the equilibrium of the Stackelberg game with a first-order algorithm. (2) We employ an autoencoder-like generative network model as the poison attacker. (3) A novel payoff function is introduced to evaluate the performance of the poison. Comprehensive experiments demonstrate that GUE can effectively poison the model in various scenarios. Furthermore, the GUE still works by using a relatively small percentage of the training data to train the generator, and the poison generator can generalize to unseen data well. Our implementation code can be found at https://github.com/hong-xian/gue.

CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu

In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

ROJan 29
InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios

Zeyi Liu, Shuang Liu, Jihai Min et al.

With the rapid development of industrial intelligence and unmanned inspection, reliable perception and safety assessment for AI systems in complex and dynamic industrial sites has become a key bottleneck for deploying predictive maintenance and autonomous inspection. Most public datasets remain limited by simulated data sources, single-modality sensing, or the absence of fine-grained object-level annotations, which prevents robust scene understanding and multimodal safety reasoning for industrial foundation models. To address these limitations, InspecSafe-V1 is released as the first multimodal benchmark dataset for industrial inspection safety assessment that is collected from routine operations of real inspection robots in real-world environments. InspecSafe-V1 covers five representative industrial scenarios, including tunnels, power facilities, sintering equipment, oil and gas petrochemical plants, and coal conveyor trestles. The dataset is constructed from 41 wheeled and rail-mounted inspection robots operating at 2,239 valid inspection sites, yielding 5,013 inspection instances. For each instance, pixel-level segmentation annotations are provided for key objects in visible-spectrum images. In addition, a semantic scene description and a corresponding safety level label are provided according to practical inspection tasks. Seven synchronized sensing modalities are further included, including infrared video, audio, depth point clouds, radar point clouds, gas measurements, temperature, and humidity, to support multimodal anomaly recognition, cross-modal fusion, and comprehensive safety assessment in industrial environments.

LGOct 31, 2025
Casing Collar Identification using AlexNet-based Neural Networks for Depth Measurement in Oil and Gas Wells

Siyu Xiao, Xindi Zhao, Tianhao Mao et al.

Accurate downhole depth measurement is essential for oil and gas well operations, directly influencing reservoir contact, production efficiency, and operational safety. Collar correlation using a casing collar locator (CCL) is fundamental for precise depth calibration. While neural network-based CCL signal recognition has achieved significant progress in collar identification, preprocessing methods for such applications remain underdeveloped. Moreover, the limited availability of real well data poses substantial challenges for training neural network models that require extensive datasets. This paper presents a system integrated into downhole tools for CCL signal acquisition to facilitate dataset construction. We propose comprehensive preprocessing methods for data augmentation and evaluate their effectiveness using our AlexNet-based neural network models. Through systematic experimentation across various configuration combinations, we analyze the contribution of each augmentation method. Results demonstrate that standardization, label distribution smoothing (LDS), and random cropping are fundamental requirements for model training, while label smoothing regularization (LSR), time scaling, and multiple sampling significantly enhance model generalization capability. The F1 scores of our two benchmark models trained with the proposed augmentation methods maximumly improve from 0.937 and 0.952 to 1.0 and 1.0, respectively. Performance validation on real CCL waveforms confirms the effectiveness and practical applicability of our approach. This work addresses the gaps in data augmentation methodologies for training casing collar recognition models in CCL data-limited environments.

DBMar 24
Why Database Manuals Are Not Enough: Efficient and Reliable Configuration Tuning for DBMSs via Code-Driven LLM Agents

Xinyi Zhang, Tiantian Chen, Zhentao Han et al.

Modern database management systems (DBMSs) expose hundreds of configuration knobs that critically influence performance. Existing automated tuning methods either adopt a data-driven paradigm, which incurs substantial overhead, or rely on manual-driven heuristics extracted from database documentation, which are often limited and overly generic. Motivated by the fact that the control logic of configuration knobs is inherently encoded in the DBMS source code, we argue that promising tuning strategies can be mined directly from the code, uncovering fine-grained insights grounded in system internals. To this end, we propose SysInsight, a code-driven database tuning system that automatically extracts fine-grained tuning knowledge from DBMS source code to accelerate and stabilize the tuning process. SysInsight combines static code analysis with LLM-based reasoning to identify knob-controlled execution paths and extract semantic tuning insights. These insights are then transformed into quantitative and verifiable tuning rules via association rule mining grounded in tuning observations. During online tuning, system diagnosis is applied to identify critical knobs, which are adjusted under the rule guidance. Evaluations demonstrate that compared to the SOTA baseline, SysInsight converges to the best configuration on average 7.11X faster while achieving a 19.9% performance improvement.

LGJul 17, 2022
Achieve Optimal Adversarial Accuracy for Adversarial Deep Learning using Stackelberg Game

Xiao-Shan Gao, Shuang Liu, Lijia Yu

Adversarial deep learning is to train robust DNNs against adversarial attacks, which is one of the major research focuses of deep learning. Game theory has been used to answer some of the basic questions about adversarial deep learning such as the existence of a classifier with optimal robustness and the existence of optimal adversarial samples for a given class of classifiers. In most previous work, adversarial deep learning was formulated as a simultaneous game and the strategy spaces are assumed to be certain probability distributions in order for the Nash equilibrium to exist. But, this assumption is not applicable to the practical situation. In this paper, we give answers to these basic questions for the practical case where the classifiers are DNNs with a given structure, by formulating the adversarial deep learning as sequential games. The existence of Stackelberg equilibria for these games are proved. Furthermore, it is shown that the equilibrium DNN has the largest adversarial accuracy among all DNNs with the same structure, when Carlini-Wagner's margin loss is used. Trade-off between robustness and accuracy in adversarial deep learning is also studied from game theoretical aspect.

AIAug 5, 2025Code
ContractEval: Benchmarking LLMs for Clause-Level Legal Risk Identification in Commercial Contracts

Shuang Liu, Zelong Li, Ruoyun Ma et al.

The potential of large language models (LLMs) in specialized domains such as legal risk analysis remains underexplored. In response to growing interest in locally deploying open-source LLMs for legal tasks while preserving data confidentiality, this paper introduces ContractEval, the first benchmark to thoroughly evaluate whether open-source LLMs could match proprietary LLMs in identifying clause-level legal risks in commercial contracts. Using the Contract Understanding Atticus Dataset (CUAD), we assess 4 proprietary and 15 open-source LLMs. Our results highlight five key findings: (1) Proprietary models outperform open-source models in both correctness and output effectiveness, though some open-source models are competitive in certain specific dimensions. (2) Larger open-source models generally perform better, though the improvement slows down as models get bigger. (3) Reasoning ("thinking") mode improves output effectiveness but reduces correctness, likely due to over-complicating simpler tasks. (4) Open-source models generate "no related clause" responses more frequently even when relevant clauses are present. This suggests "laziness" in thinking or low confidence in extracting relevant content. (5) Model quantization speeds up inference but at the cost of performance drop, showing the tradeoff between efficiency and accuracy. These findings suggest that while most LLMs perform at a level comparable to junior legal assistants, open-source models require targeted fine-tuning to ensure correctness and effectiveness in high-stakes legal settings. ContractEval offers a solid benchmark to guide future development of legal-domain LLMs.

CYJan 8
LLM Agents in Law: Taxonomy, Applications, and Challenges

Shuang Liu, Ruijia Zhang, Ruoyun Ma et al.

Large language models (LLMs) have precipitated a dramatic improvement in the legal domain, yet the deployment of standalone models faces significant limitations regarding hallucination, outdated information, and verifiability. Recently, LLM agents have attracted significant attention as a solution to these challenges, utilizing advanced capabilities such as planning, memory, and tool usage to meet the rigorous standards of legal practice. In this paper, we present a comprehensive survey of LLM agents for legal tasks, analyzing how these architectures bridge the gap between technical capabilities and domain-specific needs. Our major contributions include: (1) systematically analyzing the technical transition from standard legal LLMs to legal agents; (2) presenting a structured taxonomy of current agent applications across distinct legal practice areas; (3) discussing evaluation methodologies specifically for agentic performance in law; and (4) identifying open challenges and outlining future directions for developing robust and autonomous legal assistants.

CVMar 10
BridgeDiff: Bridging Human Observations and Flat-Garment Synthesis for Virtual Try-Off

Shuang Liu, Ao Yu, Linkang Cheng et al.

Virtual try-off (VTOFF) aims to recover canonical flat-garment representations from images of dressed persons for standardized display and downstream virtual try-on. Prior methods often treat VTOFF as direct image translation driven by local masks or text-only prompts, overlooking the gap between on-body appearances and flat layouts. This gap frequently leads to inconsistent completion in unobserved regions and unstable garment structure. We propose BridgeDiff, a diffusion-based framework that explicitly bridges human-centric observations and flat-garment synthesis through two complementary components. First, the Garment Condition Bridge Module (GCBM) builds a garment-cue representation that captures global appearance and semantic identity, enabling robust inference of continuous details under partial visibility. Second, the Flat Structure Constraint Module (FSCM) injects explicit flat-garment structural priors via Flat-Constraint Attention (FC-Attention) at selected denoising stages, improving structural stability beyond text-only conditioning. Extensive experiments on standard VTOFF benchmarks show that BridgeDiff achieves state-of-the-art performance, producing higher-quality flat-garment reconstructions while preserving fine-grained appearance and structural integrity.

LGDec 10, 2020Code
SE-ECGNet: A Multi-scale Deep Residual Network with Squeeze-and-Excitation Module for ECG Signal Classification

Haozhen Zhang, Wei Zhao, Shuang Liu

The classification of electrocardiogram (ECG) signals, which takes much time and suffers from a high rate of misjudgment, is recognized as an extremely challenging task for cardiologists. The major difficulty of the ECG signals classification is caused by the long-term sequence dependencies. Most existing approaches for ECG signal classification use Recurrent Neural Network models, e.g., LSTM and GRU, which are unable to extract accurate features for such long sequences. Other approaches utilize 1-Dimensional Convolutional Neural Network (CNN), such as ResNet or its variant, and they can not make good use of the multi-lead information from ECG signals.Based on the above observations, we develop a multi-scale deep residual network for the ECG signal classification task. We are the first to propose to treat the multi-lead signal as a 2-dimensional matrix and combines multi-scale 2-D convolution blocks with 1-D convolution blocks for feature extraction. Our proposed model achieves 99.2% F1-score in the MIT-BIH dataset and 89.4% F1-score in Alibaba dataset and outperforms the state-of-the-art performance by 2% and 3%, respectively, view related code and data at https://github.com/Amadeuszhao/SE-ECGNet

CVMar 9, 2020Code
Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS

Long Chen, Haizhou Ai, Rui Chen et al.

Estimating 3D poses of multiple humans in real-time is a classic but still challenging task in computer vision. Its major difficulty lies in the ambiguity in cross-view association of 2D poses and the huge state space when there are multiple people in multiple views. In this paper, we present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views. It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate. Unlike previous methods that associate 2D poses among all pairs of views from scratch at every frame, we exploit the temporal consistency in videos to match the 2D inputs with 3D poses directly in 3-space. More specifically, we propose to retain the 3D pose for each person and update them iteratively via the cross-view multi-human tracking. This novel formulation improves both accuracy and efficiency, as we demonstrated on widely-used public datasets. To further verify the scalability of our method, we propose a new large-scale multi-human dataset with 12 to 28 camera views. Without bells and whistles, our solution achieves 154 FPS on 12 cameras and 34 FPS on 28 cameras, indicating its ability to handle large-scale real-world applications. The proposed dataset is released at https://github.com/longcw/crossview_3d_pose_tracking.

SYDec 28, 2025
A Neural Network-Based Real-time Casing Collar Recognition System for Downhole Instruments

Si-Yu Xiao, Xin-Di Zhao, Xiang-Zhan Wang et al.

Casing collar locator (CCL) measurements are widely used as reliable depth markers for positioning downhole instruments in cased-hole operations, enabling accurate depth control for operations such as perforation. However, autonomous collar recognition in downhole environments remains challenging because CCL signals are often corrupted by toolstring- or casing-induced magnetic interference, while stringent size and power budgets limit the use of computationally intensive algorithms and specific operations require real-time, in-situ processing. To address these constraints, we propose Collar Recognition Nets (CRNs), a family of domain-specific lightweight 1-D convolutional neural networks for collar signature recognition from streaming CCL waveforms. With depthwise separable convolutions and input pooling, CRNs optimize efficiency without sacrificing accuracy. Our most compact model achieves an F1-score of 0.972 on field data with only 1,985~parameters and 8,208~MACs, and deployed on an ARM Cortex-M7 based embedded system using TensorFlow Lite for Microcontrollers (TFLM) library, the model demonstrates a throughput of 1,000 inference per second and 343.2 μs latency, confirming the feasibility of robust, autonomous, and real-time collar recognition under stringent downhole constraints.

LGJan 6, 2024
Data-Dependent Stability Analysis of Adversarial Training

Yihan Wang, Shuang Liu, Xiao-Shan Gao

Stability analysis is an essential aspect of studying the generalization ability of deep learning, as it involves deriving generalization bounds for stochastic gradient descent-based training algorithms. Adversarial training is the most widely used defense against adversarial example attacks. However, previous generalization bounds for adversarial training have not included information regarding the data distribution. In this paper, we fill this gap by providing generalization bounds for stochastic gradient descent-based adversarial training that incorporate data distribution information. We utilize the concepts of on-average stability and high-order approximate Lipschitz conditions to examine how changes in data distribution and adversarial budget can affect robust generalization gaps. Our derived generalization bounds for both convex and non-convex losses are at least as good as the uniform stability-based counterparts which do not include data distribution information. Furthermore, our findings demonstrate how distribution shifts from data poisoning attacks can impact robust generalization.

CVMar 4
LISTA-Transformer Model Based on Sparse Coding and Attention Mechanism and Its Application in Fault Diagnosis

Shuang Liu, Lina Zhao, Tian Wang et al.

Driven by the continuous development of models such as Multi-Layer Perceptron, Convolutional Neural Network (CNN), and Transformer, deep learning has made breakthrough progress in fields such as computer vision and natural language processing, and has been successfully applied in practical scenarios such as image classification and industrial fault diagnosis. However, existing models still have certain limitations in local feature modeling and global dependency capture. Specifically, CNN is limited by local receptive fields, while Transformer has shortcomings in effectively modeling local structures, and both face challenges of high model complexity and insufficient interpretability. In response to the above issues, we proposes the following innovative work: A sparse Transformer based on Learnable Iterative Shrinkage Threshold Algorithm (LISTA-Transformer) was designed, which deeply integrates LISTA sparse encoding with visual Transformer to construct a model architecture with adaptive local and global feature collaboration mechanism. This method utilizes continuous wavelet transform to convert vibration signals into time-frequency maps and inputs them into LISTA-Transformer for more effective feature extraction. On the CWRU dataset, the fault recognition rate of our method reached 98.5%, which is 3.3% higher than traditional methods and exhibits certain superiority over existing Transformer-based approaches.

LGMar 6, 2025
Provable Robust Overfitting Mitigation in Wasserstein Distributionally Robust Optimization

Shuang Liu, Yihan Wang, Yifan Zhu et al.

Wasserstein distributionally robust optimization (WDRO) optimizes against worst-case distributional shifts within a specified uncertainty set, leading to enhanced generalization on unseen adversarial examples, compared to standard adversarial training which focuses on pointwise adversarial perturbations. However, WDRO still suffers fundamentally from the robust overfitting problem, as it does not consider statistical error. We address this gap by proposing a novel robust optimization framework under a new uncertainty set for adversarial noise via Wasserstein distance and statistical error via Kullback-Leibler divergence, called the Statistically Robust WDRO. We establish a robust generalization bound for the new optimization framework, implying that out-of-distribution adversarial performance is at least as good as the statistically robust training loss with high probability. Furthermore, we derive conditions under which Stackelberg and Nash equilibria exist between the learner and the adversary, giving an optimal robust model in certain sense. Finally, through extensive experiments, we demonstrate that our method significantly mitigates robust overfitting and enhances robustness within the framework of WDRO.

LGJun 2, 2024
Generalization Bound and New Algorithm for Clean-Label Backdoor Attack

Lijia Yu, Shuang Liu, Yibo Miao et al.

The generalization bound is a crucial theoretical tool for assessing the generalizability of learning methods and there exist vast literatures on generalizability of normal learning, adversarial learning, and data poisoning. Unlike other data poison attacks, the backdoor attack has the special property that the poisoned triggers are contained in both the training set and the test set and the purpose of the attack is two-fold. To our knowledge, the generalization bound for the backdoor attack has not been established. In this paper, we fill this gap by deriving algorithm-independent generalization bounds in the clean-label backdoor attack scenario. Precisely, based on the goals of backdoor attack, we give upper bounds for the clean sample population errors and the poison population errors in terms of the empirical error on the poisoned training dataset. Furthermore, based on the theoretical result, a new clean-label backdoor attack is proposed that computes the poisoning trigger by combining adversarial noise and indiscriminate poison. We show its effectiveness in a variety of settings.

CLMay 21, 2023
A Pilot Study on Dialogue-Level Dependency Parsing for Chinese

Gongyao Jiang, Shuang Liu, Meishan Zhang et al.

Dialogue-level dependency parsing has received insufficient attention, especially for Chinese. To this end, we draw on ideas from syntactic dependency and rhetorical structure theory (RST), developing a high-quality human-annotated corpus, which contains 850 dialogues and 199,803 dependencies. Considering that such tasks suffer from high annotation costs, we investigate zero-shot and few-shot scenarios. Based on an existing syntactic treebank, we adopt a signal-based method to transform seen syntactic dependencies into unseen ones between elementary discourse units (EDUs), where the signals are detected by masked language modeling. Besides, we apply single-view and multi-view data selection to access reliable pseudo-labeled instances. Experimental results show the effectiveness of these baselines. Moreover, we discuss several crucial points about our dataset and approach.

CVDec 17, 2021
Symmetry-aware Neural Architecture for Embodied Visual Navigation

Shuang Liu, Takayuki Okatani

Visual exploration is a task that seeks to visit all the navigable areas of an environment as quickly as possible. The existing methods employ deep reinforcement learning (RL) as the standard tool for the task. However, they tend to be vulnerable to statistical shifts between the training and test data, resulting in poor generalization over novel environments that are out-of-distribution (OOD) from the training data. In this paper, we attempt to improve the generalization ability by utilizing the inductive biases available for the task. Employing the active neural SLAM (ANS) that learns exploration policies with the advantage actor-critic (A2C) method as the base framework, we first point out that the mappings represented by the actor and the critic should satisfy specific symmetries. We then propose a network design for the actor and the critic to inherently attain these symmetries. Specifically, we use $G$-convolution instead of the standard convolution and insert the semi-global polar pooling (SGPP) layer, which we newly design in this study, in the last section of the critic network. Experimental results show that our method increases area coverage by $8.1 m^2$ when trained on the Gibson dataset and tested on the MP3D dataset, establishing the new state-of-the-art.

CROct 29, 2021
A Comprehensive Study on Learning-Based PE Malware Family Classification Methods

Yixuan Ma, Shuang Liu, Jiajun Jiang et al.

Driven by the high profit, Portable Executable (PE) malware has been consistently evolving in terms of both volume and sophistication. PE malware family classification has gained great attention and a large number of approaches have been proposed. With the rapid development of machine learning techniques and the exciting results they achieved on various tasks, machine learning algorithms have also gained popularity in the PE malware family classification task. Three mainstream approaches that use learning based algorithms, as categorized by the input format the methods take, are image-based, binary-based and disassembly-based approaches. Although a large number of approaches are published, there is no consistent comparisons on those approaches, especially from the practical industry adoption perspective. Moreover, there is no comparison in the scenario of concept drift, which is a fact for the malware classification task due to the fast evolving nature of malware. In this work, we conduct a thorough empirical study on learning-based PE malware classification approaches on 4 different datasets and consistent experiment settings. Based on the experiment results and an interview with our industry partners, we find that (1) there is no individual class of methods that significantly outperforms the others; (2) All classes of methods show performance degradation on concept drift (by an average F1-score of 32.23%); and (3) the prediction time and high memory consumption hinder existing approaches from being adopted for industry usage.

CVSep 13, 2021
Task Guided Compositional Representation Learning for ZDA

Shuang Liu, Mete Ozay

Zero-shot domain adaptation (ZDA) methods aim to transfer knowledge about a task learned in a source domain to a target domain, while data from target domain are not available. In this work, we address learning feature representations which are invariant to and shared among different domains considering task characteristics for ZDA. To this end, we propose a method for task-guided ZDA (TG-ZDA) which employs multi-branch deep neural networks to learn feature representations exploiting their domain invariance and shareability properties. The proposed TG-ZDA models can be trained end-to-end without requiring synthetic tasks and data generated from estimated representations of target domains. The proposed TG-ZDA has been examined using benchmark ZDA tasks on image classification datasets. Experimental results show that our proposed TG-ZDA outperforms state-of-the-art ZDA methods for different domains and tasks.

IRAug 13, 2021
GQE-PRF: Generative Query Expansion with Pseudo-Relevance Feedback

Minghui Huang, Dong Wang, Shuang Liu et al.

Query expansion with pseudo-relevance feedback (PRF) is a powerful approach to enhance the effectiveness in information retrieval. Recently, with the rapid advance of deep learning techniques, neural text generation has achieved promising success in many natural language tasks. To leverage the strength of text generation for information retrieval, in this article, we propose a novel approach which effectively integrates text generation models into PRF-based query expansion. In particular, our approach generates augmented query terms via neural text generation models conditioned on both the initial query and pseudo-relevance feedback. Moreover, in order to train the generative model, we adopt the conditional generative adversarial nets (CGANs) and propose the PRF-CGAN method in which both the generator and the discriminator are conditioned on the pseudo-relevance feedback. We evaluate the performance of our approach on information retrieval tasks using two benchmark datasets. The experimental results show that our approach achieves comparable performance or outperforms traditional query expansion methods on both the retrieval and reranking tasks.

HCNov 3, 2020
Visualization of Technical and Tactical Characteristics in Fencing

Mingdong Zhang, Li Chen, Xiaoru Yuan et al.

Fencing is a sport that relies heavily on the use of tactics. However, most existing methods for analyzing fencing data are based on statistical models in which hidden patterns are difficult to discover. Unlike sequential games, such as tennis and table tennis, fencing is a type of simultaneous game. Thus, the existing methods on the sports visualization do not operate well for fencing matches. In this study, we cooperated with experts to analyze the technical and tactical characteristics of fencing competitions. To meet the requirements of the fencing experts, we designed and implemented FencingVis, an interactive visualization system for fencing competition data.The action sequences in the bout are first visualized by modified bar charts to reveal the actions of footworks and bladeworks of both fencers. Then an interactive technique is provided for exploring the patterns of behavior of fencers. The different combinations of tactical behavior patterns are further mapped to the graph model and visualized by a tactical flow graph. This graph can reveal the different strategies adopted by both fencers and their mutual influence in one bout. We also provided a number of well-coordinated views to supplement the tactical flow graph and display the information of the fencing competition from different perspectives. The well-coordinated views are meant to organically integrate with the tactical flow graph through consistent visual style and view coordination. We demonstrated the usability and effectiveness of the proposed system with three case studies. On the basis of expert feedback, FencingVis can help analysts find not only the tactical patterns hidden in fencing bouts, but also the technical and tactical characteristics of the contestant.

SEMar 18, 2020
Automated synthesis of local time requirement for service composition

Étienne André, Tian Huat Tan, Manman Chen et al.

Service composition aims at achieving a business goal by composing existing service-based applications or components. The response time of a service is crucial especially in time critical business environments, which is often stated as a clause in service level agreements between service providers and service users. To meet the guaranteed response time requirement of a composite service, it is important to select a feasible set of component services such that their response time will collectively satisfy the response time requirement of the composite service. In this work, we use the BPEL modeling language, that aims at specifying Web services. We extend it with timing parameters, and equip it with a formal semantics. Then, we propose a fully automated approach to synthesize the response time requirement of component services modeled using BPEL, in the form of a constraint on the local response times. The synthesized requirement will guarantee the satisfaction of the global response time requirement, statically or dynamically. We implemented our work into a tool, Selamat, and performed several experiments to evaluate the validity of our approach.

LGFeb 12, 2020
Regret Bounds for Discounted MDPs

Shuang Liu, Hao Su

Reinforcement learning (RL) has traditionally been understood from an episodic perspective; the concept of non-episodic RL, where there is no restart and therefore no reliable recovery, remains elusive. A fundamental question in non-episodic RL is how to measure the performance of a learner and derive algorithms to maximize such performance. Conventional wisdom is to maximize the difference between the average reward received by the learner and the maximal long-term average reward. In this paper, we argue that if the total time budget is relatively limited compared to the complexity of the environment, such comparison may fail to reflect the finite-time optimality of the learner. We propose a family of measures, called $γ$-regret, which we believe to better capture the finite-time optimality. We give motivations and derive lower and upper bounds for such measures. Note: A follow-up work (arXiv:2010.00587) has improved both our lower and upper bound, the gap is now closed at $\tildeΘ\left(\frac{\sqrt{SAT}}{(1 - γ)^{\frac{1}{2}}}\right)$.

LGNov 14, 2019
There is Limited Correlation between Coverage and Robustness for Deep Neural Networks

Yizhen Dong, Peixin Zhang, Jingyi Wang et al.

Deep neural networks (DNN) are increasingly applied in safety-critical systems, e.g., for face recognition, autonomous car control and malware detection. It is also shown that DNNs are subject to attacks such as adversarial perturbation and thus must be properly tested. Many coverage criteria for DNN since have been proposed, inspired by the success of code coverage criteria for software programs. The expectation is that if a DNN is a well tested (and retrained) according to such coverage criteria, it is more likely to be robust. In this work, we conduct an empirical study to evaluate the relationship between coverage, robustness and attack/defense metrics for DNN. Our study is the largest to date and systematically done based on 100 DNN models and 25 metrics. One of our findings is that there is limited correlation between coverage and robustness, i.e., improving coverage does not help improve the robustness. Our dataset and implementation have been made available to serve as a benchmark for future studies on testing DNN.

LGSep 25, 2019
Multi-task Batch Reinforcement Learning with Metric Learning

Jiachen Li, Quan Vuong, Shuang Liu et al.

We tackle the Multi-task Batch Reinforcement Learning problem. Given multiple datasets collected from different tasks, we train a multi-task policy to perform well in unseen tasks sampled from the same distribution. The task identities of the unseen tasks are not provided. To perform well, the policy must infer the task identity from collected transitions by modelling its dependency on states, actions and rewards. Because the different datasets may have state-action distributions with large divergence, the task inference module can learn to ignore the rewards and spuriously correlate $\textit{only}$ state-action pairs to the task identity, leading to poor test time performance. To robustify task inference, we propose a novel application of the triplet loss. To mine hard negative examples, we relabel the transitions from the training tasks by approximating their reward functions. When we allow further training on the unseen tasks, using the trained policy as an initialization leads to significantly faster convergence compared to randomly initialized policies (up to $80\%$ improvement and across 5 different Mujoco task distributions). We name our method $\textbf{MBML}$ ($\textbf{M}\text{ulti-task}$ $\textbf{B}\text{atch}$ RL with $\textbf{M}\text{etric}$ $\textbf{L}\text{earning}$).

LGSep 12, 2018
The Inductive Bias of Restricted f-GANs

Shuang Liu, Kamalika Chaudhuri

Generative adversarial networks are a novel method for statistical inference that have achieved much empirical success; however, the factors contributing to this success remain ill-understood. In this work, we attempt to analyze generative adversarial learning -- that is, statistical inference as the result of a game between a generator and a discriminator -- with the view of understanding how it differs from classical statistical inference solutions such as maximum likelihood inference and the method of moments. Specifically, we provide a theoretical characterization of the distribution inferred by a simple form of generative adversarial learning called restricted f-GANs -- where the discriminator is a function in a given function class, the distribution induced by the generator is restricted to lie in a pre-specified distribution class and the objective is similar to a variational form of the f-divergence. A consequence of our result is that for linear KL-GANs -- that is, when the discriminator is a linear function over some feature space and f corresponds to the KL-divergence -- the distribution induced by the optimal generator is neither the maximum likelihood nor the method of moments solution, but an interesting combination of both.

SEMay 1, 2018
Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning

Takumi Akazaki, Shuang Liu, Yoriyuki Yamagata et al.

With the rapid development of software and distributed computing, Cyber-Physical Systems (CPS) are widely adopted in many application areas, e.g., smart grid, autonomous automobile. It is difficult to detect defects in CPS models due to the complexities involved in the software and physical systems. To find defects in CPS models efficiently, robustness guided falsification of CPS is introduced. Existing methods use several optimization techniques to generate counterexamples, which falsify the given properties of a CPS. However those methods may require a large number of simulation runs to find the counterexample and is far from practical. In this work, we explore state-of-the-art Deep Reinforcement Learning (DRL) techniques to reduce the number of simulation runs required to find such counterexamples. We report our method and the preliminary evaluation results.

CVDec 12, 2017
A vision based system for underwater docking

Shuang Liu, Mete Ozay, Takayuki Okatani et al.

Autonomous underwater vehicles (AUVs) have been deployed for underwater exploration. However, its potential is confined by its limited on-board battery energy and data storage capacity. This problem has been addressed using docking systems by underwater recharging and data transfer for AUVs. In this work, we propose a vision based framework for underwater docking following these systems. The proposed framework comprises two modules; (i) a detection module which provides location information on underwater docking stations in 2D images captured by an on-board camera, and (ii) a pose estimation module which recovers the relative 3D position and orientation between docking stations and AUVs from the 2D images. For robust and credible detection of docking stations, we propose a convolutional neural network called Docking Neural Network (DoNN). For accurate pose estimation, a perspective-n-point algorithm is integrated into our framework. In order to examine our framework in underwater docking tasks, we collected a dataset of 2D images, named Underwater Docking Images Dataset (UDID), in an experimental water pool. To the best of our knowledge, UDID is the first publicly available underwater docking dataset. In the experiments, we first evaluate performance of the proposed detection module on UDID and its deformed variations. Next, we assess the accuracy of the pose estimation module by ground experiments, since it is not feasible to obtain true relative position and orientation between docking stations and AUVs under water. Then, we examine the pose estimation module by underwater experiments in our experimental water pool. Experimental results show that the proposed framework can be used to detect docking stations and estimate their relative pose efficiently and successfully, compared to the state-of-the-art baseline systems.

LGMay 24, 2017
Approximation and Convergence Properties of Generative Adversarial Learning

Shuang Liu, Olivier Bousquet, Kamalika Chaudhuri

Generative adversarial networks (GAN) approximate a target data distribution by jointly optimizing an objective function through a "two-player game" between a generator and a discriminator. Despite their empirical success, however, two very basic questions on how well they can approximate the target distribution remain unanswered. First, it is not known how restricting the discriminator family affects the approximation quality. Second, while a number of different objective functions have been proposed, we do not understand when convergence to the global minima of the objective function leads to convergence to the target distribution under various notions of distributional convergence. In this paper, we address these questions in a broad and unified setting by defining a notion of adversarial divergences that includes a number of recently proposed objective functions. We show that if the objective function is an adversarial divergence with some additional conditions, then using a restricted discriminator family has a moment-matching effect. Additionally, we show that for objective functions that are strict adversarial divergences, convergence in the objective function implies weak convergence, thus generalizing previous results.

MLOct 26, 2015
A Parallel algorithm for $\mathcal{X}$-Armed bandits

Cheng Chen, Shuang Liu, Zhihua Zhang et al.

The target of $\mathcal{X}$-armed bandit problem is to find the global maximum of an unknown stochastic function $f$, given a finite budget of $n$ evaluations. Recently, $\mathcal{X}$-armed bandits have been widely used in many situations. Many of these applications need to deal with large-scale data sets. To deal with these large-scale data sets, we study a distributed setting of $\mathcal{X}$-armed bandits, where $m$ players collaborate to find the maximum of the unknown function. We develop a novel anytime distributed $\mathcal{X}$-armed bandit algorithm. Compared with prior work on $\mathcal{X}$-armed bandits, our algorithm uses a quite different searching strategy so as to fit distributed learning scenarios. Our theoretical analysis shows that our distributed algorithm is $m$ times faster than the classical single-player algorithm. Moreover, the number of communication rounds of our algorithm is only logarithmic in $mn$. The numerical results show that our method can make effective use of every players to minimize the loss. Thus, our distributed approach is attractive and useful.

LGApr 14, 2015
Regret vs. Communication: Distributed Stochastic Multi-Armed Bandits and Beyond

Shuang Liu, Cheng Chen, Zhihua Zhang

In this paper, we consider the distributed stochastic multi-armed bandit problem, where a global arm set can be accessed by multiple players independently. The players are allowed to exchange their history of observations with each other at specific points in time. We study the relationship between regret and communication. When the time horizon is known, we propose the Over-Exploration strategy, which only requires one-round communication and whose regret does not scale with the number of players. When the time horizon is unknown, we measure the frequency of communication through a new notion called the density of the communication set, and give an exact characterization of the interplay between regret and communication. Specifically, a lower bound is established and stable strategies that match the lower bound are developed. The results and analyses in this paper are specific but can be translated into more general settings.