Jianping Zhang

CV
Semantic Scholar Profile
h-index23
43papers
1,104citations
Novelty46%
AI Score49

43 Papers

CVJun 14, 2023Code
On the Robustness of Latent Diffusion Models

Jianping Zhang, Zhuoer Xu, Shiwen Cui et al.

Latent diffusion models achieve state-of-the-art performance on a variety of generative tasks, such as image synthesis and image editing. However, the robustness of latent diffusion models is not well studied. Previous works only focus on the adversarial attacks against the encoder or the output image under white-box settings, regardless of the denoising process. Therefore, in this paper, we aim to analyze the robustness of latent diffusion models more thoroughly. We first study the influence of the components inside latent diffusion models on their white-box robustness. In addition to white-box scenarios, we evaluate the black-box robustness of latent diffusion models via transfer attacks, where we consider both prompt-transfer and model-transfer settings and possible defense mechanisms. However, all these explorations need a comprehensive benchmark dataset, which is missing in the literature. Therefore, to facilitate the research of the robustness of latent diffusion models, we propose two automatic dataset construction pipelines for two kinds of image editing models and release the whole dataset. Our code and dataset are available at \url{https://github.com/jpzhang1810/LDM-Robustness}.

CVSep 26, 2023Code
Structure Invariant Transformation for better Adversarial Transferability

Xiaosen Wang, Zeliang Zhang, Jianping Zhang

Given the severe vulnerability of Deep Neural Networks (DNNs) against adversarial examples, there is an urgent need for an effective adversarial attack to identify the deficiencies of DNNs in security-sensitive applications. As one of the prevalent black-box adversarial attacks, the existing transfer-based attacks still cannot achieve comparable performance with the white-box attacks. Among these, input transformation based attacks have shown remarkable effectiveness in boosting transferability. In this work, we find that the existing input transformation based attacks transform the input image globally, resulting in limited diversity of the transformed images. We postulate that the more diverse transformed images result in better transferability. Thus, we investigate how to locally apply various transformations onto the input image to improve such diversity while preserving the structure of image. To this end, we propose a novel input transformation based attack, called Structure Invariant Attack (SIA), which applies a random image transformation onto each image block to craft a set of diverse images for gradient calculation. Extensive experiments on the standard ImageNet dataset demonstrate that SIA exhibits much better transferability than the existing SOTA input transformation based attacks on CNN-based and transformer-based models, showing its generality and superiority in boosting transferability. Code is available at https://github.com/xiaosen-wang/SIT.

CVMar 28, 2023
Improving the Transferability of Adversarial Samples by Path-Augmented Method

Jianping Zhang, Jen-tse Huang, Wenxuan Wang et al. · pku, tencent-ai

Deep neural networks have achieved unprecedented success on diverse vision tasks. However, they are vulnerable to adversarial noise that is imperceptible to humans. This phenomenon negatively affects their deployment in real-world scenarios, especially security-related ones. To evaluate the robustness of a target model in practice, transfer-based attacks craft adversarial samples with a local model and have attracted increasing attention from researchers due to their high efficiency. The state-of-the-art transfer-based attacks are generally based on data augmentation, which typically augments multiple training images from a linear path when learning adversarial samples. However, such methods selected the image augmentation path heuristically and may augment images that are semantics-inconsistent with the target images, which harms the transferability of the generated adversarial samples. To overcome the pitfall, we propose the Path-Augmented Method (PAM). Specifically, PAM first constructs a candidate augmentation path pool. It then settles the employed augmentation paths during adversarial sample generation with greedy search. Furthermore, to avoid augmenting semantics-inconsistent images, we train a Semantics Predictor (SP) to constrain the length of the augmentation path. Extensive experiments confirm that PAM can achieve an improvement of over 4.8% on average compared with the state-of-the-art baselines in terms of the attack success rates.

CLFeb 11, 2023
MTTM: Metamorphic Testing for Textual Content Moderation Software

Wenxuan Wang, Jen-tse Huang, Weibin Wu et al. · pku, tencent-ai

The exponential growth of social media platforms such as Twitter and Facebook has revolutionized textual communication and textual content publication in human society. However, they have been increasingly exploited to propagate toxic content, such as hate speech, malicious advertisement, and pornography, which can lead to highly negative impacts (e.g., harmful effects on teen mental health). Researchers and practitioners have been enthusiastically developing and extensively deploying textual content moderation software to address this problem. However, we find that malicious users can evade moderation by changing only a few words in the toxic content. Moreover, modern content moderation software performance against malicious inputs remains underexplored. To this end, we propose MTTM, a Metamorphic Testing framework for Textual content Moderation software. Specifically, we conduct a pilot study on 2,000 text messages collected from real users and summarize eleven metamorphic relations across three perturbation levels: character, word, and sentence. MTTM employs these metamorphic relations on toxic textual contents to generate test cases, which are still toxic yet likely to evade moderation. In our evaluation, we employ MTTM to test three commercial textual content moderation software and two state-of-the-art moderation algorithms against three kinds of toxic content. The results show that MTTM achieves up to 83.9%, 51%, and 82.5% error finding rates (EFR) when testing commercial moderation software provided by Google, Baidu, and Huawei, respectively, and it obtains up to 91.2% EFR when testing the state-of-the-art algorithms from the academy. In addition, we leverage the test cases generated by MTTM to retrain the model we explored, which largely improves model robustness (0% to 5.9% EFR) while maintaining the accuracy on the original test set.

SEMay 13, 2022
AEON: A Method for Automatic Evaluation of NLP Test Cases

Jen-tse Huang, Jianping Zhang, Wenxuan Wang et al. · pku, tencent-ai

Due to the labor-intensive nature of manual test oracle construction, various automated testing techniques have been proposed to enhance the reliability of Natural Language Processing (NLP) software. In theory, these techniques mutate an existing test case (e.g., a sentence with its label) and assume the generated one preserves an equivalent or similar semantic meaning and thus, the same label. However, in practice, many of the generated test cases fail to preserve similar semantic meaning and are unnatural (e.g., grammar errors), which leads to a high false alarm rate and unnatural test cases. Our evaluation study finds that 44% of the test cases generated by the state-of-the-art (SOTA) approaches are false alarms. These test cases require extensive manual checking effort, and instead of improving NLP software, they can even degrade NLP software when utilized in model training. To address this problem, we propose AEON for Automatic Evaluation Of NLP test cases. For each generated test case, it outputs scores based on semantic similarity and language naturalness. We employ AEON to evaluate test cases generated by four popular testing techniques on five datasets across three typical NLP tasks. The results show that AEON aligns the best with human judgment. In particular, AEON achieves the best average precision in detecting semantic inconsistent test cases, outperforming the best baseline metric by 10%. In addition, AEON also has the highest average precision of finding unnatural test cases, surpassing the baselines by more than 15%. Moreover, model training with test cases prioritized by AEON leads to models that are more accurate and robust, demonstrating AEON's potential in improving NLP software.

IVAug 6, 2023Code
Nest-DGIL: Nesterov-optimized Deep Geometric Incremental Learning for CS Image Reconstruction

Xiaohong Fan, Yin Yang, Ke Chen et al.

Proximal gradient-based optimization is one of the most common strategies to solve inverse problem of images, and it is easy to implement. However, these techniques often generate heavy artifacts in image reconstruction. One of the most popular refinement methods is to fine-tune the regularization parameter to alleviate such artifacts, but it may not always be sufficient or applicable due to increased computational costs. In this work, we propose a deep geometric incremental learning framework based on the second Nesterov proximal gradient optimization. The proposed end-to-end network not only has the powerful learning ability for high-/low-frequency image features, but also can theoretically guarantee that geometric texture details will be reconstructed from preliminary linear reconstruction. Furthermore, it can avoid the risk of intermediate reconstruction results falling outside the geometric decomposition domains and achieve fast convergence. Our reconstruction framework is decomposed into four modules including general linear reconstruction, cascade geometric incremental restoration, Nesterov acceleration, and post-processing. In the image restoration step, a cascade geometric incremental learning module is designed to compensate for missing texture information from different geometric spectral decomposition domains. Inspired by the overlap-tile strategy, we also develop a post-processing module to remove the block effect in patch-wise-based natural image reconstruction. All parameters in the proposed model are learnable, an adaptive initialization technique of physical parameters is also employed to make model flexibility and ensure converging smoothly. We compare the reconstruction performance of the proposed method with existing state-of-the-art methods to demonstrate its superiority. Our source codes are available at https://github.com/fanxiaohong/Nest-DGIL.

CVOct 27, 2022Code
FAS-UNet: A Novel FAS-driven Unet to Learn Variational Image Segmentation

Hui Zhu, Shi Shu, Jianping Zhang

Solving variational image segmentation problems with hidden physics is often expensive and requires different algorithms and manually tunes model parameter. The deep learning methods based on the U-Net structure have obtained outstanding performances in many different medical image segmentation tasks, but designing such networks requires a lot of parameters and training data, not always available for practical problems. In this paper, inspired by traditional multi-phase convexity Mumford-Shah variational model and full approximation scheme (FAS) solving the nonlinear systems, we propose a novel variational-model-informed network (denoted as FAS-Unet) that exploits the model and algorithm priors to extract the multi-scale features. The proposed model-informed network integrates image data and mathematical models, and implements them through learning a few convolution kernels. Based on the variational theory and FAS algorithm, we first design a feature extraction sub-network (FAS-Solution module) to solve the model-driven nonlinear systems, where a skip-connection is employed to fuse the multi-scale features. Secondly, we further design a convolution block to fuse the extracted features from the previous stage, resulting in the final segmentation possibility. Experimental results on three different medical image segmentation tasks show that the proposed FAS-Unet is very competitive with other state-of-the-art methods in qualitative, quantitative and model complexity evaluations. Moreover, it may also be possible to train specialized network architectures that automatically satisfy some of the mathematical and physical laws in other image problems for better accuracy, faster training and improved generalization.The code is available at \url{https://github.com/zhuhui100/FASUNet}.

CVMar 28, 2023
Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization

Jianping Zhang, Yizhan Huang, Weibin Wu et al.

Vision transformers (ViTs) have been successfully deployed in a variety of computer vision tasks, but they are still vulnerable to adversarial samples. Transfer-based attacks use a local model to generate adversarial samples and directly transfer them to attack a target black-box model. The high efficiency of transfer-based attacks makes it a severe security threat to ViT-based applications. Therefore, it is vital to design effective transfer-based attacks to identify the deficiencies of ViTs beforehand in security-sensitive scenarios. Existing efforts generally focus on regularizing the input gradients to stabilize the updated direction of adversarial samples. However, the variance of the back-propagated gradients in intermediate blocks of ViTs may still be large, which may make the generated adversarial samples focus on some model-specific features and get stuck in poor local optima. To overcome the shortcomings of existing approaches, we propose the Token Gradient Regularization (TGR) method. According to the structural characteristics of ViTs, TGR reduces the variance of the back-propagated gradient in each internal block of ViTs in a token-wise manner and utilizes the regularized gradient to generate adversarial samples. Extensive experiments on attacking both ViTs and CNNs confirm the superiority of our approach. Notably, compared to the state-of-the-art transfer-based attacks, our TGR offers a performance improvement of 8.8% on average.

CVSep 8, 2023Code
PRISTA-Net: Deep Iterative Shrinkage Thresholding Network for Coded Diffraction Patterns Phase Retrieval

Aoxu Liu, Xiaohong Fan, Yin Yang et al.

The problem of phase retrieval (PR) involves recovering an unknown image from limited amplitude measurement data and is a challenge nonlinear inverse problem in computational imaging and image processing. However, many of the PR methods are based on black-box network models that lack interpretability and plug-and-play (PnP) frameworks that are computationally complex and require careful parameter tuning. To address this, we have developed PRISTA-Net, a deep unfolding network (DUN) based on the first-order iterative shrinkage thresholding algorithm (ISTA). This network utilizes a learnable nonlinear transformation to address the proximal-point mapping sub-problem associated with the sparse priors, and an attention mechanism to focus on phase information containing image edges, textures, and structures. Additionally, the fast Fourier transform (FFT) is used to learn global features to enhance local information, and the designed logarithmic-based loss function leads to significant improvements when the noise level is low. All parameters in the proposed PRISTA-Net framework, including the nonlinear transformation, threshold parameters, and step size, are learned end-to-end instead of being manually set. This method combines the interpretability of traditional methods with the fast inference ability of deep learning and is able to handle noise at each iteration during the unfolding stage, thus improving recovery quality. Experiments on Coded Diffraction Patterns (CDPs) measurements demonstrate that our approach outperforms the existing state-of-the-art methods in terms of qualitative and quantitative evaluations. Our source codes are available at \emph{https://github.com/liuaxou/PRISTA-Net}.

CVAug 4, 2023
A Bi-variant Variational Model for Diffeomorphic Image Registration with Relaxed Jacobian Determinant Constraints

Yanyan Li, Ke Chen, Chong Chen et al.

Diffeomorphic registration is a widely used technique for finding a smooth and invertible transformation between two coordinate systems, which are measured using template and reference images. The point-wise volume-preserving constraint $\det(\nabla\bm{\varphi}(\bm{x})) =1$ is effective in some cases, but may be too restrictive in others, especially when local deformations are relatively large. This can result in poor matching when enforcing large local deformations. In this paper, we propose a new bi-variant diffeomorphic image registration model that introduces a soft constraint on the Jacobian equation $\det(\nabla\bm{\varphi}(\bm{x})) = f(\bm{x}) > 0$. This allows local deformations to shrink and grow within a flexible range $0<κ_{m}<\det(\nabla\bm{\varphi}(\bm{x}))<κ_{M}$. The Jacobian determinant of transformation is explicitly controlled by optimizing the relaxation function $f(\bm{x})$. To prevent deformation folding and improve the smoothness of the transformation, a positive constraint is imposed on the optimization of the relaxation function $f(\bm{x})$, and a regularizer is used to ensure the smoothness of $f(\bm{x})$. Furthermore, the positivity constraint ensures that $f(\bm{x})$ is as close to one as possible, which helps to achieve a volume-preserving transformation on average. We also analyze the existence of the minimizer for the variational model and propose a penalty-splitting algorithm with a multilevel strategy to solve this model. Numerical experiments demonstrate the convergence of the proposed algorithm and show that the positivity constraint can effectively control the range of relative volume without compromising the accuracy of the registration. Moreover, the proposed model generates diffeomorphic maps for large local deformations and outperforms several existing registration models in terms of performance.

LGJul 20, 2023
Identifying Performance Issues in Cloud Service Systems Based on Relational-Temporal Features

Wenwei Gu, Jinyang Liu, Zhuangbin Chen et al.

Cloud systems are susceptible to performance issues, which may cause service-level agreement violations and financial losses. In current practice, crucial metrics are monitored periodically to provide insight into the operational status of components. Identifying performance issues is often formulated as an anomaly detection problem, which is tackled by analyzing each metric independently. However, this approach overlooks the complex dependencies existing among cloud components. Some graph neural network-based methods take both temporal and relational information into account, however, the correlation violations in the metrics that serve as indicators of underlying performance issues are difficult for them to identify. Furthermore, a large volume of components in a cloud system results in a vast array of noisy metrics. This complexity renders it impractical for engineers to fully comprehend the correlations, making it challenging to identify performance issues accurately. To address these limitations, we propose Identifying Performance Issues based on Relational-Temporal Features (ISOLATE ), a learning-based approach that leverages both the relational and temporal features of metrics to identify performance issues. In particular, it adopts a graph neural network with attention to characterizing the relations among metrics and extracts long-term and multi-scale temporal patterns using a GRU and a convolution network, respectively. The learned graph attention weights can be further used to localize the correlation-violated metrics. Moreover, to relieve the impact of noisy data, ISOLATE utilizes a positive unlabeled learning strategy that tags pseudo-labels based on a small portion of confirmed negative examples. Extensive evaluation on both public and industrial datasets shows that ISOLATE outperforms all baseline models with 0.945 F1-score and 0.920 Hit rate@3.

CVAug 30, 2023
Physics-Informed DeepMRI: Bridging the Gap from Heat Diffusion to k-Space Interpolation

Zhuo-Xu Cui, Congcong Liu, Xiaohong Fan et al.

In the field of parallel imaging (PI), alongside image-domain regularization methods, substantial research has been dedicated to exploring $k$-space interpolation. However, the interpretability of these methods remains an unresolved issue. Furthermore, these approaches currently face acceleration limitations that are comparable to those experienced by image-domain methods. In order to enhance interpretability and overcome the acceleration limitations, this paper introduces an interpretable framework that unifies both $k$-space interpolation techniques and image-domain methods, grounded in the physical principles of heat diffusion equations. Building upon this foundational framework, a novel $k$-space interpolation method is proposed. Specifically, we model the process of high-frequency information attenuation in $k$-space as a heat diffusion equation, while the effort to reconstruct high-frequency information from low-frequency regions can be conceptualized as a reverse heat equation. However, solving the reverse heat equation poses a challenging inverse problem. To tackle this challenge, we modify the heat equation to align with the principles of magnetic resonance PI physics and employ the score-based generative method to precisely execute the modified reverse heat diffusion. Finally, experimental validation conducted on publicly available datasets demonstrates the superiority of the proposed approach over traditional $k$-space interpolation methods, deep learning-based $k$-space interpolation methods, and conventional diffusion models in terms of reconstruction accuracy, particularly in high-frequency regions.

SESep 17, 2024
Grounded GUI Understanding for Vision-Based Spatial Intelligent Agent: Exemplified by Extended Reality Apps

Shuqing Li, Binchang Li, Yepang Liu et al.

In recent years, spatial computing a.k.a. Extended Reality (XR) has emerged as a transformative technology, offering users immersive and interactive experiences across diversified virtual environments. Users can interact with XR apps through interactable GUI elements (IGEs) on the stereoscopic three-dimensional (3D) graphical user interface (GUI). The accurate recognition of these IGEs is instrumental, serving as the foundation of many software engineering tasks, including automated testing and effective GUI search. The most recent IGE detection approaches for 2D mobile apps typically train a supervised object detection model based on a large-scale manually-labeled GUI dataset, usually with a pre-defined set of clickable GUI element categories like buttons and spinners. Such approaches can hardly be applied to IGE detection in XR apps, due to a multitude of challenges including complexities posed by open-vocabulary and heterogeneous IGE categories, intricacies of context-sensitive interactability, and the necessities of precise spatial perception and visual-semantic alignment for accurate IGE detection results. Thus, it is necessary to embark on the IGE research tailored to XR apps. In this paper, we propose the first zero-shot cOntext-sensitive inteRactable GUI ElemeNT dEtection framework for virtual Reality apps, named Orienter. By imitating human behaviors, Orienter observes and understands the semantic contexts of XR app scenes first, before performing the detection. The detection process is iterated within a feedback-directed validation and reflection loop. Specifically, Orienter contains three components, including (1) Semantic context comprehension, (2) Reflection-directed IGE candidate detection, and (3) Context-sensitive interactability classification. Extensive experiments demonstrate that Orienter is more effective than the state-of-the-art GUI element detection approaches.

IVMay 14, 2022
An Interpretable MRI Reconstruction Network with Two-grid-cycle Correction and Geometric Prior Distillation

Xiaohong Fan, Yin Yang, Ke Chen et al.

Although existing deep learning compressed-sensing-based Magnetic Resonance Imaging (CS-MRI) methods have achieved considerably impressive performance, explainability and generalizability continue to be challenging for such methods since the transition from mathematical analysis to network design not always natural enough, often most of them are not flexible enough to handle multi-sampling-ratio reconstruction assignments. {In this work, to tackle explainability and generalizability, we propose a unifying deep unfolding multi-sampling-ratio interpretable CS-MRI framework.} The combined approach offers more generalizability than previous works whereas deep learning gains explainability through a geometric prior module. Inspired by the multigrid algorithm, we first embed the CS-MRI-based optimization algorithm into correction-distillation scheme that consists of three ingredients: pre-relaxation module, correction module and geometric prior distillation module. Furthermore, we employ a condition module to learn adaptively step-length and noise level, which enables the proposed framework to jointly train multi-ratio tasks through a single model. { The proposed model not only compensates for the lost contextual information of reconstructed image which is refined from low frequency error in geometric characteristic k-space}, but also integrates the theoretical guarantee of model-based methods and the superior reconstruction performances of deep learning-based methods. Therefore, it can give us a novel perspective to design biomedical imaging networks. { Numerical experiments show that our framework outperforms state-of-the-art methods in terms of qualitative and quantitative evaluations.} {Our method achieves 3.18 dB improvement at low CS ratio 10\% and average 1.42 dB improvement over other comparison methods on brain dataset using Cartesian sampling mask.

CVAug 15, 2023
Backpropagation Path Search On Adversarial Transferability

Zhuoer Xu, Zhangxuan Gu, Jianping Zhang et al.

Deep neural networks are vulnerable to adversarial examples, dictating the imperativeness to test the model's robustness before deployment. Transfer-based attackers craft adversarial examples against surrogate models and transfer them to victim models deployed in the black-box situation. To enhance the adversarial transferability, structure-based attackers adjust the backpropagation path to avoid the attack from overfitting the surrogate model. However, existing structure-based attackers fail to explore the convolution module in CNNs and modify the backpropagation graph heuristically, leading to limited effectiveness. In this paper, we propose backPropagation pAth Search (PAS), solving the aforementioned two problems. We first propose SkipConv to adjust the backpropagation path of convolution by structural reparameterization. To overcome the drawback of heuristically designed backpropagation paths, we further construct a DAG-based search space, utilize one-step approximation for path evaluation and employ Bayesian Optimization to search for the optimal path. We conduct comprehensive experiments in a wide range of transfer settings, showing that PAS improves the attack success rate by a huge margin for both normally trained and defense models.

SEJun 2, 2023
DSHGT: Dual-Supervisors Heterogeneous Graph Transformer -- A pioneer study of using heterogeneous graph learning for detecting software vulnerabilities

Tiehua Zhang, Rui Xu, Jianping Zhang et al.

Vulnerability detection is a critical problem in software security and attracts growing attention both from academia and industry. Traditionally, software security is safeguarded by designated rule-based detectors that heavily rely on empirical expertise, requiring tremendous effort from software experts to generate rule repositories for large code corpus. Recent advances in deep learning, especially Graph Neural Networks (GNN), have uncovered the feasibility of automatic detection of a wide range of software vulnerabilities. However, prior learning-based works only break programs down into a sequence of word tokens for extracting contextual features of codes, or apply GNN largely on homogeneous graph representation (e.g., AST) without discerning complex types of underlying program entities (e.g., methods, variables). In this work, we are one of the first to explore heterogeneous graph representation in the form of Code Property Graph and adapt a well-known heterogeneous graph network with a dual-supervisor structure for the corresponding graph learning task. Using the prototype built, we have conducted extensive experiments on both synthetic datasets and real-world projects. Compared with the state-of-the-art baselines, the results demonstrate promising effectiveness in this research direction in terms of vulnerability detection performance (average F1 improvements over 10\% in real-world projects) and transferability from C/C++ to other programming languages (average F1 improvements over 11%).

CLSep 3, 2024
Leveraging Large Language Models for Solving Rare MIP Challenges

Teng Wang, Wing-Yin Yu, Ruifeng She et al.

Mixed Integer Programming (MIP) has been extensively applied in areas requiring mathematical solvers to address complex instances within tight time constraints. However, as the problem scale increases, the complexity of model formulation and finding feasible solutions escalates significantly. In contrast, the model-building cost for end-to-end models, such as large language models (LLMs), remains largely unaffected by problem scale due to their pattern recognition capabilities. While LLMs, like GPT-4, without fine-tuning, can handle some traditional medium-scale MIP problems, they struggle with uncommon or highly specialized MIP scenarios. Fine-tuning LLMs can yield some feasible solutions for medium-scale MIP instances, but these models typically fail to explore diverse solutions when constrained by a low and constant temperature, limiting their performance. In this paper, we propose and evaluate a recursively dynamic temperature method integrated with a chain-of-thought approach. Our findings show that starting with a high temperature and gradually lowering it leads to better feasible solutions compared to other dynamic temperature strategies. Additionally, by comparing results generated by the LLM with those from Gurobi, we demonstrate that the LLM can produce solutions that complement traditional solvers by accelerating the pruning process and improving overall efficiency.

AIFeb 9
OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

Teng Wang, Rong Shan, Jianghao Lin et al.

Composed image retrieval (CIR) requires complex reasoning over heterogeneous visual and textual constraints. Existing approaches largely fall into two paradigms: unified embedding retrieval, which suffers from single-model myopia, and heuristic agentic retrieval, which is limited by suboptimal, trial-and-error orchestration. To this end, we propose OSCAR, an optimization-steered agentic planning framework for composed image retrieval. We are the first to reformulate agentic CIR from a heuristic search process into a principled trajectory optimization problem. Instead of relying on heuristic trial-and-error exploration, OSCAR employs a novel offline-online paradigm. In the offline phase, we model CIR via atomic retrieval selection and composition as a two-stage mixed-integer programming problem, mathematically deriving optimal trajectories that maximize ground-truth coverage for training samples via rigorous boolean set operations. These trajectories are then stored in a golden library to serve as in-context demonstrations for online steering of VLM planner at online inference time. Extensive experiments on three public benchmarks and a private industrial benchmark show that OSCAR consistently outperforms SOTA baselines. Notably, it achieves superior performance using only 10% of training data, demonstrating strong generalization of planning logic rather than dataset-specific memorization.

CVSep 14, 2023
A Multi-scale Generalized Shrinkage Threshold Network for Image Blind Deblurring in Remote Sensing

Yujie Feng, Yin Yang, Xiaohong Fan et al.

Remote sensing images are essential for many applications of the earth's sciences, but their quality can usually be degraded due to limitations in sensor technology and complex imaging environments. To address this, various remote sensing image deblurring methods have been developed to restore sharp and high-quality images from degraded observational data. However, most traditional model-based deblurring methods usually require predefined {hand-crafted} prior assumptions, which are difficult to handle in complex applications. On the other hand, deep learning-based deblurring methods are often considered as black boxes, lacking transparency and interpretability. In this work, we propose a new blind deblurring learning framework that utilizes alternating iterations of shrinkage thresholds. This framework involves updating blurring kernels and images, with a theoretical foundation in network design. Additionally, we propose a learnable blur kernel proximal mapping module to improve the accuracy of the blur kernel reconstruction. Furthermore, we propose a deep proximal mapping module in the image domain, which combines a generalized shrinkage threshold with a multi-scale prior feature extraction block. This module also incorporates an attention mechanism to learn adaptively the importance of prior information, improving the flexibility and robustness of prior terms, and avoiding limitations similar to hand-crafted image prior terms. Consequently, we design a novel multi-scale generalized shrinkage threshold network (MGSTNet) that focuses specifically on learning deep geometric prior features to enhance image restoration. Experimental results on real and synthetic remote sensing image datasets demonstrate the superiority of our MGSTNet framework compared to existing deblurring methods.

CVMar 10, 2025Code
VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models

Jen-tse Huang, Jiantong Qin, Jianping Zhang et al. · pku, tencent-ai

This research investigates both explicit and implicit social biases exhibited by Vision-Language Models (VLMs). The key distinction between these bias types lies in the level of awareness: explicit bias refers to conscious, intentional biases, while implicit bias operates subconsciously. To analyze explicit bias, we directly pose questions to VLMs related to gender and racial differences: (1) Multiple-choice questions based on a given image (e.g., "What is the education level of the person in the image?") (2) Yes-No comparisons using two images (e.g., "Is the person in the first image more educated than the person in the second image?") For implicit bias, we design tasks where VLMs assist users but reveal biases through their responses: (1) Image description tasks: Models are asked to describe individuals in images, and we analyze disparities in textual cues across demographic groups. (2) Form completion tasks: Models draft a personal information collection form with 20 attributes, and we examine correlations among selected attributes for potential biases. We evaluate Gemini-1.5, GPT-4V, GPT-4o, LLaMA-3.2-Vision and LLaVA-v1.6. Our code and data are publicly available at https://github.com/uscnlp-lime/VisBias.

IVJul 11, 2021Code
Deep Geometric Distillation Network for Compressive Sensing MRI

Xiaohong Fan, Yin Yang, Jianping Zhang

Compressed sensing (CS) is an efficient method to reconstruct MR image from small sampled data in $k$-space and accelerate the acquisition of MRI. In this work, we propose a novel deep geometric distillation network which combines the merits of model-based and deep learning-based CS-MRI methods, it can be theoretically guaranteed to improve geometric texture details of a linear reconstruction. Firstly, we unfold the model-based CS-MRI optimization problem into two sub-problems that consist of image linear approximation and image geometric compensation. Secondly, geometric compensation sub-problem for distilling lost texture details in approximation stage can be expanded by Taylor expansion to design a geometric distillation module fusing features of different geometric characteristic domains. Additionally, we use a learnable version with adaptive initialization of the step-length parameter, which allows model more flexibility that can lead to convergent smoothly. Numerical experiments verify its superiority over other state-of-the-art CS-MRI reconstruction approaches. The source code will be available at \url{https://github.com/fanxiaohong/Deep-Geometric-Distillation-Network-for-CS-MRI}

LGFeb 24, 2025
Generative Models in Decision Making: A Survey

Yinchuan Li, Xinyu Shao, Jianping Zhang et al.

In recent years, the exceptional performance of generative models in generative tasks has sparked significant interest in their integration into decision-making processes. Due to their ability to handle complex data distributions and their strong model capacity, generative models can be effectively incorporated into decision-making systems by generating trajectories that guide agents toward high-reward state-action regions or intermediate sub-goals. This paper presents a comprehensive review of the application of generative models in decision-making tasks. We classify seven fundamental types of generative models: energy-based models, generative adversarial networks, variational autoencoders, normalizing flows, diffusion models, generative flow networks, and autoregressive models. Regarding their applications, we categorize their functions into three main roles: controllers, modelers and optimizers, and discuss how each role contributes to decision-making. Furthermore, we examine the deployment of these models across five critical real-world decision-making scenarios. Finally, we summarize the strengths and limitations of current approaches and propose three key directions for advancing next-generation generative directive models: high-performance algorithms, large-scale generalized decision-making models, and self-evolving and adaptive models.

CLMar 16, 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models

Teng Wang, Zhangyi Jiang, Zhenqi He et al.

Recent studies show that Large Language Models (LLMs) achieve strong reasoning capabilities through supervised fine-tuning or reinforcement learning. However, a key approach, the Process Reward Model (PRM), suffers from reward hacking, making it unreliable in identifying the best intermediate step. In addition, the cost of annotating reasoning processes for reward modeling is high, making large-scale collection of high-quality data challenging. To address this, we propose a novel reward model approach called the Hierarchical Reward Model (HRM), which evaluates both individual and consecutive reasoning steps at both fine-grained and coarse-grained levels. HRM excels at assessing multi-step reasoning coherence, especially when flawed steps are later corrected through self-reflection. To further reduce the cost of generating training data, we introduce a lightweight and effective data augmentation strategy called Hierarchical Node Compression (HNC), which merges two consecutive reasoning steps into one within the tree structure. By applying HNC to MCTS-generated reasoning trajectories, we enhance the diversity and robustness of HRM training data while introducing controlled noise with minimal computational overhead. Empirical results on the PRM800K dataset show that HRM, together with HNC, provides more stable and reliable evaluations than PRM. Furthermore, cross-domain evaluations on the MATH500 and GSM8K datasets demonstrate HRM's strong generalization and robustness across a variety of reasoning tasks.

CRAug 22, 2025
Confusion is the Final Barrier: Rethinking Jailbreak Evaluation and Investigating the Real Misuse Threat of LLMs

Yu Yan, Sheng Sun, Zhe Wang et al.

With the development of Large Language Models (LLMs), numerous efforts have revealed their vulnerabilities to jailbreak attacks. Although these studies have driven the progress in LLMs' safety alignment, it remains unclear whether LLMs have internalized authentic knowledge to deal with real-world crimes, or are merely forced to simulate toxic language patterns. This ambiguity raises concerns that jailbreak success is often attributable to a hallucination loop between jailbroken LLM and judger LLM. By decoupling the use of jailbreak techniques, we construct knowledge-intensive Q\&A to investigate the misuse threats of LLMs in terms of dangerous knowledge possession, harmful task planning utility, and harmfulness judgment robustness. Experiments reveal a mismatch between jailbreak success rates and harmful knowledge possession in LLMs, and existing LLM-as-a-judge frameworks tend to anchor harmfulness judgments on toxic language patterns. Our study reveals a gap between existing LLM safety assessments and real-world threat potential.

AIFeb 28, 2024
TroubleLLM: Align to Red Team Expert

Zhuoer Xu, Jianping Zhang, Shiwen Cui et al.

Large Language Models (LLMs) become the start-of-the-art solutions for a variety of natural language tasks and are integrated into real-world applications. However, LLMs can be potentially harmful in manifesting undesirable safety issues like social biases and toxic content. It is imperative to assess its safety issues before deployment. However, the quality and diversity of test prompts generated by existing methods are still far from satisfactory. Not only are these methods labor-intensive and require large budget costs, but the controllability of test prompt generation is lacking for the specific testing domain of LLM applications. With the idea of LLM for LLM testing, we propose the first LLM, called TroubleLLM, to generate controllable test prompts on LLM safety issues. Extensive experiments and human evaluation illustrate the superiority of TroubleLLM on generation quality and generation controllability.

CLJul 8, 2025
Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs

Yizhan Huang, Zhe Yang, Meifang Chen et al.

Large Language Models (LLMs) are known to memorize portions of their training data, sometimes reproducing content verbatim when prompted appropriately. In this work, we investigate a fundamental yet under-explored question in the domain of memorization: How to characterize memorization difficulty of training data in LLMs? Through empirical experiments on OLMo, a family of open models, we present the Entropy-Memorization Law. It suggests that data entropy is linearly correlated with memorization score. Moreover, in a case study of memorizing highly randomized strings, or "gibberish", we observe that such sequences, despite their apparent randomness, exhibit unexpectedly low empirical entropy compared to the broader training corpus. Adopting the same strategy to discover Entropy-Memorization Law, we derive a simple yet effective approach to distinguish training and testing data, enabling Dataset Inference (DI).

LGFeb 9, 2025
Certifying Language Model Robustness with Fuzzed Randomized Smoothing: An Efficient Defense Against Backdoor Attacks

Bowei He, Lihao Yin, Hui-Ling Zhen et al.

The widespread deployment of pre-trained language models (PLMs) has exposed them to textual backdoor attacks, particularly those planted during the pre-training stage. These attacks pose significant risks to high-reliability applications, as they can stealthily affect multiple downstream tasks. While certifying robustness against such threats is crucial, existing defenses struggle with the high-dimensional, interdependent nature of textual data and the lack of access to original poisoned pre-training data. To address these challenges, we introduce \textbf{F}uzzed \textbf{R}andomized \textbf{S}moothing (\textbf{FRS}), a novel approach for efficiently certifying language model robustness against backdoor attacks. FRS integrates software robustness certification techniques with biphased model parameter smoothing, employing Monte Carlo tree search for proactive fuzzing to identify vulnerable textual segments within the Damerau-Levenshtein space. This allows for targeted and efficient text randomization, while eliminating the need for access to poisoned training data during model smoothing. Our theoretical analysis demonstrates that FRS achieves a broader certified robustness radius compared to existing methods. Extensive experiments across various datasets, model configurations, and attack strategies validate FRS's superiority in terms of defense efficiency, accuracy, and robustness.

CVDec 10, 2024
A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing

Yujie Feng, Yin Yang, Xiaohong Fan et al.

Recently, deep learning methods have gained remarkable achievements in the field of image restoration for remote sensing (RS). However, most existing RS image restoration methods focus mainly on conventional first-order degradation models, which may not effectively capture the imaging mechanisms of remote sensing images. Furthermore, many RS image restoration approaches that use deep learning are often criticized for their lacks of architecture transparency and model interpretability. To address these problems, we propose a novel progressive restoration network for high-order degradation imaging (HDI-PRNet), to progressively restore different image degradation. HDI-PRNet is developed based on the theoretical framework of degradation imaging, also Markov properties of the high-order degradation process and Maximum a posteriori (MAP) estimation, offering the benefit of mathematical interpretability within the unfolding network. The framework is composed of three main components: a module for image denoising that relies on proximal mapping prior learning, a module for image deblurring that integrates Neumann series expansion with dual-domain degradation learning, and a module for super-resolution. Extensive experiments demonstrate that our method achieves superior performance on both synthetic and real remote sensing images.

CVMar 5, 2025
DA-STGCN: 4D Trajectory Prediction Based on Spatiotemporal Feature Extraction

Yuheng Kuang, Zhengning Wang, Jianping Zhang et al.

The importance of four-dimensional (4D) trajectory prediction within air traffic management systems is on the rise. Key operations such as conflict detection and resolution, aircraft anomaly monitoring, and the management of congested flight paths are increasingly reliant on this foundational technology, underscoring the urgent demand for intelligent solutions. The dynamics in airport terminal zones and crowded airspaces are intricate and ever-changing; however, current methodologies do not sufficiently account for the interactions among aircraft. To tackle these challenges, we propose DA-STGCN, an innovative spatiotemporal graph convolutional network that integrates a dual attention mechanism. Our model reconstructs the adjacency matrix through a self-attention approach, enhancing the capture of node correlations, and employs graph attention to distill spatiotemporal characteristics, thereby generating a probabilistic distribution of predicted trajectories. This novel adjacency matrix, reconstructed with the self-attention mechanism, is dynamically optimized throughout the network's training process, offering a more nuanced reflection of the inter-node relationships compared to traditional algorithms. The performance of the model is validated on two ADS-B datasets, one near the airport terminal area and the other in dense airspace. Experimental results demonstrate a notable improvement over current 4D trajectory prediction methods, achieving a 20% and 30% reduction in the Average Displacement Error (ADE) and Final Displacement Error (FDE), respectively. The incorporation of a Dual-Attention module has been shown to significantly enhance the extraction of node correlations, as verified by ablation experiments.

IVNov 5, 2024
A Symmetric Dynamic Learning Framework for Diffeomorphic Medical Image Registration

Jinqiu Deng, Ke Chen, Mingke Li et al.

Diffeomorphic image registration is crucial for various medical imaging applications because it can preserve the topology of the transformation. This study introduces DCCNN-LSTM-Reg, a learning framework that evolves dynamically and learns a symmetrical registration path by satisfying a specified control increment system. This framework aims to obtain symmetric diffeomorphic deformations between moving and fixed images. To achieve this, we combine deep learning networks with diffeomorphic mathematical mechanisms to create a continuous and dynamic registration architecture, which consists of multiple Symmetric Registration (SR) modules cascaded on five different scales. Specifically, our method first uses two U-nets with shared parameters to extract multiscale feature pyramids from the images. We then develop an SR-module comprising a sequential CNN-LSTM architecture to progressively correct the forward and reverse multiscale deformation fields using control increment learning and the homotopy continuation technique. Through extensive experiments on three 3D registration tasks, we demonstrate that our method outperforms existing approaches in both quantitative and qualitative evaluations.

SEJun 13, 2024
Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in Virtual Reality Apps

Shuqing Li, Cuiyun Gao, Jianping Zhang et al.

The quality of Virtual Reality (VR) apps is vital, particularly the rendering quality of the VR Graphical User Interface (GUI). Different from traditional 2D apps, VR apps create a 3D digital scene for users, by rendering two distinct 2D images for the user's left and right eyes, respectively. Stereoscopic visual inconsistency (denoted as "SVI") issues, however, undermine the rendering process of the user's brain, leading to user discomfort and even adverse health effects. Such issues commonly exist but remain underexplored. We conduct an empirical analysis on 282 SVI bug reports from 15 VR platforms, summarizing 15 types of manifestations. The empirical analysis reveals that automatically detecting SVI issues is challenging, mainly because: (1) lack of training data; (2) the manifestations of SVI issues are diverse, complicated, and often application-specific; (3) most accessible VR apps are closed-source commercial software. Existing pattern-based supervised classification approaches may be inapplicable or ineffective in detecting the SVI issues. To counter these challenges, we propose an unsupervised black-box testing framework named StereoID to identify the stereoscopic visual inconsistencies, based only on the rendered GUI states. StereoID generates a synthetic right-eye image based on the actual left-eye image and computes distances between the synthetic right-eye image and the actual right-eye image to detect SVI issues. We propose a depth-aware conditional stereo image translator to power the image generation process, which captures the expected perspective shifts between left-eye and right-eye images. We build a large-scale unlabeled VR stereo screenshot dataset with larger than 171K images from 288 real-world VR apps for experiments. After substantial experiments, StereoID demonstrates superior performance for detecting SVI issues in both user reports and wild VR apps.

SEMay 23, 2023
Validating Multimedia Content Moderation Software via Semantic Fusion

Wenxuan Wang, Jingyuan Huang, Chang Chen et al.

The exponential growth of social media platforms, such as Facebook and TikTok, has revolutionized communication and content publication in human society. Users on these platforms can publish multimedia content that delivers information via the combination of text, audio, images, and video. Meanwhile, the multimedia content release facility has been increasingly exploited to propagate toxic content, such as hate speech, malicious advertisements, and pornography. To this end, content moderation software has been widely deployed on these platforms to detect and blocks toxic content. However, due to the complexity of content moderation models and the difficulty of understanding information across multiple modalities, existing content moderation software can fail to detect toxic content, which often leads to extremely negative impacts. We introduce Semantic Fusion, a general, effective methodology for validating multimedia content moderation software. Our key idea is to fuse two or more existing single-modal inputs (e.g., a textual sentence and an image) into a new input that combines the semantics of its ancestors in a novel manner and has toxic nature by construction. This fused input is then used for validating multimedia content moderation software. We realized Semantic Fusion as DUO, a practical content moderation software testing tool. In our evaluation, we employ DUO to test five commercial content moderation software and two state-of-the-art models against three kinds of toxic content. The results show that DUO achieves up to 100% error finding rate (EFR) when testing moderation software. In addition, we leverage the test cases generated by DUO to retrain the two models we explored, which largely improves model robustness while maintaining the accuracy on the original test set.

LGMar 31, 2022
Improving Adversarial Transferability via Neuron Attribution-Based Attacks

Jianping Zhang, Weibin Wu, Jen-tse Huang et al.

Deep neural networks (DNNs) are known to be vulnerable to adversarial examples. It is thus imperative to devise effective attack algorithms to identify the deficiencies of DNNs beforehand in security-sensitive applications. To efficiently tackle the black-box setting where the target model's particulars are unknown, feature-level transfer-based attacks propose to contaminate the intermediate feature outputs of local models, and then directly employ the crafted adversarial samples to attack the target model. Due to the transferability of features, feature-level attacks have shown promise in synthesizing more transferable adversarial samples. However, existing feature-level attacks generally employ inaccurate neuron importance estimations, which deteriorates their transferability. To overcome such pitfalls, in this paper, we propose the Neuron Attribution-based Attack (NAA), which conducts feature-level attacks with more accurate neuron importance estimations. Specifically, we first completely attribute a model's output to each neuron in a middle layer. We then derive an approximation scheme of neuron attribution to tremendously reduce the computation overhead. Finally, we weight neurons based on their attribution results and launch feature-level attacks. Extensive experiments confirm the superiority of our approach to the state-of-the-art benchmarks.

CGOct 20, 2021
A unifying framework for $n$-dimensional quasi-conformal mappings

Daoping Zhang, Gary P. T. Choi, Jianping Zhang et al.

With the advancement of computer technology, there is a surge of interest in effective mapping methods for objects in higher-dimensional spaces. To establish a one-to-one correspondence between objects, higher-dimensional quasi-conformal theory can be utilized for ensuring the bijectivity of the mappings. In addition, it is often desirable for the mappings to satisfy certain prescribed geometric constraints and possess low distortion in conformality or volume. In this work, we develop a unifying framework for computing $n$-dimensional quasi-conformal mappings. More specifically, we propose a variational model that integrates quasi-conformal distortion, volumetric distortion, landmark correspondence, intensity mismatch and volume prior information to handle a large variety of deformation problems. We further prove the existence of a minimizer for the proposed model and devise efficient numerical methods to solve the optimization problem. We demonstrate the effectiveness of the proposed framework using various experiments in two- and three-dimensions, with applications to medical image registration, adaptive remeshing and shape modeling.

IRFeb 9, 2021
CNN Application in Detection of Privileged Documents in Legal Document Review

Rishi Chhatwal, Robert Keeling, Peter Gronvall et al.

Protecting privileged communications and data from disclosure is paramount for legal teams. Legal advice, such as attorney-client communications or litigation strategy are typically exempt from disclosure in litigations or regulatory events and are vital to the attorney-client relationship. To protect this information from disclosure, companies and outside counsel often review vast amounts of documents to determine those that contain privileged material. This process is extremely costly and time consuming. As data volumes increase, legal counsel normally employs methods to reduce the number of documents requiring review while balancing the need to ensure the protection of privileged information. Keyword searching is relied upon as a method to target privileged information and reduce document review populations. Keyword searches are effective at casting a wide net but often return overly inclusive results - most of which do not contain privileged information. To overcome the weaknesses of keyword searching, legal teams increasingly are using machine learning techniques to target privileged information. In these studies, classic text classification techniques are applied to build classification models to identify privileged documents. In this paper, the authors propose a different method by applying machine learning / convolutional neural network techniques (CNN) to identify privileged documents. Our proposed method combines keyword searching with CNN. For each keyword term, a CNN model is created using the context of the occurrences of the keyword. In addition, a method was proposed to select reliable privileged (positive) training keyword occurrences from labeled positive training documents. Extensive experiments were conducted, and the results show that the proposed methods can significantly reduce false positives while still capturing most of the true positives.

IRDec 19, 2019
A Framework for Explainable Text Classification in Legal Document Review

Christian J. Mahoney, Jianping Zhang, Nathaniel Huber-Fliflet et al.

Companies regularly spend millions of dollars producing electronically-stored documents in legal matters. Recently, parties on both sides of the 'legal aisle' are accepting the use of machine learning techniques like text classification to cull massive volumes of data and to identify responsive documents for use in these matters. While text classification is regularly used to reduce the discovery costs in legal matters, it also faces a peculiar perception challenge: amongst lawyers, this technology is sometimes looked upon as a "black box", little information provided for attorneys to understand why documents are classified as responsive. In recent years, a group of AI and ML researchers have been actively researching Explainable AI, in which actions or decisions are human understandable. In legal document review scenarios, a document can be identified as responsive, if one or more of its text snippets are deemed responsive. In these scenarios, if text classification can be used to locate these snippets, then attorneys could easily evaluate the model's classification decision. When deployed with defined and explainable results, text classification can drastically enhance overall quality and speed of the review process by reducing the review time. Moreover, explainable predictive coding provides lawyers with greater confidence in the results of that supervised learning task. This paper describes a framework for explainable text classification as a valuable tool in legal services: for enhancing the quality and efficiency of legal document review and for assisting in locating responsive snippets within responsive documents. This framework has been implemented in our legal analytics product, which has been used in hundreds of legal matters. We also report our experimental results using the data from an actual legal matter that used this type of document review.

IRDec 19, 2019
Empirical Comparisons of CNN with Other Learning Algorithms for Text Classification in Legal Document Review

Robert Keeling, Rishi Chhatwal, Nathaniel Huber-Fliflet et al.

Research has shown that Convolutional Neural Networks (CNN) can be effectively applied to text classification as part of a predictive coding protocol. That said, most research to date has been conducted on data sets with short documents that do not reflect the variety of documents in real world document reviews. Using data from four actual reviews with documents of varying lengths, we compared CNN with other popular machine learning algorithms for text classification, including Logistic Regression, Support Vector Machine, and Random Forest. For each data set, classification models were trained with different training sample sizes using different learning algorithms. These models were then evaluated using a large randomly sampled test set of documents, and the results were compared using precision and recall curves. Our study demonstrates that CNN performed well, but that there was no single algorithm that performed the best across the combination of data sets and training sample sizes. These results will help advance research into the legal profession's use of machine learning algorithms that maximize performance.

IRJun 11, 2019
Evaluation of Seed Set Selection Approaches and Active Learning Strategies in Predictive Coding

Christian J. Mahoney, Nathaniel Huber-Fliflet, Haozhen Zhao et al.

Active learning is a popular methodology in text classification - known in the legal domain as "predictive coding" or "Technology Assisted Review" or "TAR" - due to its potential to minimize the required review effort to build effective classifiers. In this study, we use extensive experimentation to examine the impact of popular seed set selection strategies in active learning, within a predictive coding exercise, and evaluate different active learning strategies against well-researched continuous active learning strategies for the purpose of determining efficient training methods for classifying large populations quickly and precisely. We study how random sampling, keyword models and clustering based seed set selection strategies combined together with top-ranked, uncertain, random, recall inspired, and hybrid active learning document selection strategies affect the performance of active learning for predictive coding. We use the percentage of documents requiring review to reach 75% recall as the "benchmark" metric to evaluate and compare our approaches. In most cases we find that seed set selection methods have a minor impact, though they do show significant impact in lower richness data sets or when choosing a top-ranked active learning selection strategy. Our results also show that active learning selection strategies implementing uncertainty, random, or 75% recall selection strategies has the potential to reach the optimum active learning round much earlier than the popular continuous active learning approach (top-ranked selection). The results of our research shed light on the impact of active learning seed set selection strategies and also the effectiveness of the selection strategies for the following learning rounds. Legal practitioners can use the results of this study to enhance the efficiency, precision, and simplicity of their predictive coding process.

IRApr 3, 2019
An Empirical Study of the Application of Machine Learning and Keyword Terms Methodologies to Privilege-Document Review Projects in Legal Matters

Peter Gronvall, Nathaniel Huber-Fliflet, Jianping Zhang et al.

Protecting privileged communications and data from disclosure is paramount for legal teams. Unrestricted legal advice, such as attorney-client communications or litigation strategy. are vital to the legal process and are exempt from disclosure in litigations or regulatory events. To protect this information from being disclosed, companies and outside counsel must review vast amounts of documents to determine those that contain privileged material. This process is extremely costly and time consuming. As data volumes increase, legal counsel employ methods to reduce the number of documents requiring review while balancing the need to ensure the protection of privileged information. Keyword searching is relied upon as a method to target privileged information and reduce document review populations. Keyword searches are effective at casting a wide net but return over inclusive results -- most of which do not contain privileged information -- and without detailed knowledge of the data, keyword lists cannot be crafted to find all privilege material. Overly-inclusive keyword searching can also be problematic, because even while it drives up costs, it also can cast `too far of a net' and thus produce unreliable results.To overcome these weaknesses of keyword searching, legal teams are using a new method to target privileged information called predictive modeling. Predictive modeling can successfully identify privileged material but little research has been published to confirm its effectiveness when compared to keyword searching. This paper summarizes a study of the effectiveness of keyword searching and predictive modeling when applied to real-world data. With this study, this group of collaborators wanted to examine and understand the benefits and weaknesses of both approaches to legal teams with identifying privilege material in document populations.

IRApr 3, 2019
Explainable Text Classification in Legal Document Review A Case Study of Explainable Predictive Coding

Rishi Chhatwal, Peter Gronvall, Nathaniel Huber-Fliflet et al.

In today's legal environment, lawsuits and regulatory investigations require companies to embark upon increasingly intensive data-focused engagements to identify, collect and analyze large quantities of data. When documents are staged for review the process can require companies to dedicate an extraordinary level of resources, both with respect to human resources, but also with respect to the use of technology-based techniques to intelligently sift through data. For several years, attorneys have been using a variety of tools to conduct this exercise, and most recently, they are accepting the use of machine learning techniques like text classification to efficiently cull massive volumes of data to identify responsive documents for use in these matters. In recent years, a group of AI and Machine Learning researchers have been actively researching Explainable AI. In an explainable AI system, actions or decisions are human understandable. In typical legal `document review' scenarios, a document can be identified as responsive, as long as one or more of the text snippets in a document are deemed responsive. In these scenarios, if predictive coding can be used to locate these responsive snippets, then attorneys could easily evaluate the model's document classification decision. When deployed with defined and explainable results, predictive coding can drastically enhance the overall quality and speed of the document review process by reducing the time it takes to review documents. The authors of this paper propose the concept of explainable predictive coding and simple explainable predictive coding methods to locate responsive snippets within responsive documents. We also report our preliminary experimental results using the data from an actual legal matter that entailed this type of document review.

IRApr 3, 2019
Empirical Evaluations of Active Learning Strategies in Legal Document Review

Rishi Chhatwal, Nathaniel Huber-Fliflet, Robert Keeling et al.

One type of machine learning, text classification, is now regularly applied in the legal matters involving voluminous document populations because it can reduce the time and expense associated with the review of those documents. One form of machine learning - Active Learning - has drawn attention from the legal community because it offers the potential to make the machine learning process even more effective. Active Learning, applied to legal documents, is considered a new technology in the legal domain and is continuously applied to all documents in a legal matter until an insignificant number of relevant documents are left for review. This implementation is slightly different than traditional implementations of Active Learning where the process stops once achieving acceptable model performance. The purpose of this paper is twofold: (i) to question whether Active Learning actually is a superior learning methodology and (ii) to highlight the ways that Active Learning can be most effectively applied to real legal industry data. Unlike other studies, our experiments were performed against large data sets taken from recent, real-world legal matters covering a variety of areas. We conclude that, although these experiments show the Active Learning strategy popularly used in legal document review can quickly identify informative training documents, it becomes less effective over time. In particular, our findings suggest this most popular form of Active Learning in the legal arena, where the highest-scoring documents are selected as training examples, is in fact not the most efficient approach in most instances. Ultimately, a different Active Learning strategy may be best suited to initiate the predictive modeling process but not to continue through the entire document review.

IRApr 3, 2019
Empirical Evaluations of Preprocessing Parameters' Impact on Predictive Coding's Effectiveness

Rishi Chhatwal, Nathaniel Huber-Fliflet, Robert Keeling et al.

Predictive coding, once used in only a small fraction of legal and business matters, is now widely deployed to quickly cull through increasingly vast amounts of data and reduce the need for costly and inefficient human document review. Previously, the sole front-end input used to create a predictive model was the exemplar documents (training data) chosen by subject-matter experts. Many predictive coding tools require users to rely on static preprocessing parameters and a single machine learning algorithm to develop the predictive model. Little research has been published discussing the impact preprocessing parameters and learning algorithms have on the effectiveness of the technology. A deeper dive into the generation of a predictive model shows that the settings and algorithm can have a strong effect on the accuracy and efficacy of a predictive coding tool. Understanding how these input parameters affect the output will empower legal teams with the information they need to implement predictive coding as efficiently and effectively as possible. This paper outlines different preprocessing parameters and algorithms as applied to multiple real-world data sets to understand the influence of various approaches.

CVSep 6, 2015
A Total Fractional-Order Variation Model for Image Restoration with Non-homogeneous Boundary Conditions and its Numerical Solution

Jianping Zhang, Ke Chen

To overcome the weakness of a total variation based model for image restoration, various high order (typically second order) regularization models have been proposed and studied recently. In this paper we analyze and test a fractional-order derivative based total $α$-order variation model, which can outperform the currently popular high order regularization models. There exist several previous works using total $α$-order variations for image restoration; however first no analysis is done yet and second all tested formulations, differing from each other, utilize the zero Dirichlet boundary conditions which are not realistic (while non-zero boundary conditions violate definitions of fractional-order derivatives). This paper first reviews some results of fractional-order derivatives and then analyzes the theoretical properties of the proposed total $α$-order variational model rigorously. It then develops four algorithms for solving the variational problem, one based on the variational Split-Bregman idea and three based on direct solution of the discretise-optimization problem. Numerical experiments show that, in terms of restoration quality and solution efficiency, the proposed model can produce highly competitive results, for smooth images, to two established high order models: the mean curvature and the total generalized variation.