Xuan Zhao

CV
h-index33
48papers
502citations
Novelty46%
AI Score55

48 Papers

94.1CVMay 29
SteerFace: Debiasing Synthetic Face Generation via Adaptive Residue Perturbation

Yuxi Mi, Qiuyang Yuan, Jianqing Xu et al.

The shortage of legally compliant data for face recognition training has sparked growing interest in using synthetic data as an alternative. While recent diffusion-based methods enable the generation of photorealistic face images with strong identity adherence and data diversity, their downstream recognition performance still exhibits a significant synthetic-real gap. This paper identifies visual tendency as a previously underexplored limitation, whereby synthetic data exhibit an unrealistic prevalence of visual attributes and thus deviate from the real-data distribution. Visual tendency can be attributed to the generator's conditioning on identity embeddings, through which co-occurring residual visual cues are unintentionally absorbed into learned identity semantics. To discourage the generator from exploiting such visual cues, this paper proposes SteerFace, a simple and efficient training framework that perturbs identity embeddings by steering them toward random orthogonal directions on the embedding hypersphere. The perturbation serves as an identity-preserving regularizer that penalizes the generator's reliance on non-identity components, as supported by theoretical analysis. This paper further introduces an adaptive strategy that learns perturbation strengths with both sample-wise preference and favorable overall statistics. Extensive experiments show that SteerFace effectively mitigates visual tendency, outperforms prior methods in downstream face recognition, and generalizes well across different training datasets and generation pipelines.

NAApr 3, 2017
Adaptive Finite Element Method for fractional differential equations using Hierarchical Matrices

Xuan Zhao, Xiaozhe Hu, Wei Cai et al.

A robust and fast solver for the fractional differential equation (FDEs) involving the Riesz fractional derivative is developed using an adaptive finite element method on non-uniform meshes. It is based on the utilization of hierarchical matrices ($\mathcal{H}$-Matrices) for the representation of the stiffness matrix resulting from the finite element discretization of the FDEs. We employ a geometric multigrid method for the solution of the algebraic system of equations. We combine it with an adaptive algorithm based on a posteriori error estimation to deal with general-type singularities arising in the solution of the FDEs. Through various test examples we demonstrate the efficiency of the method and the high-accuracy of the numerical solution even in the presence of singularities. The proposed technique has been verified effectively through fundamental examples including Riesz, Left/Right Riemann-Liouville fractional derivative and, furthermore, it can be readily extended to more general fractional differential equations with different boundary conditions and low-order terms. To the best of our knowledge, there are currently no other methods for FDEs that resolve singularities accurately at linear complexity as the one we propose here.

LGNov 14, 2023Code
Counterfactual Explanation for Regression via Disentanglement in Latent Space

Xuan Zhao, Klaus Broelemann, Gjergji Kasneci

Counterfactual Explanations (CEs) help address the question: How can the factors that influence the prediction of a predictive model be changed to achieve a more favorable outcome from a user's perspective? Thus, they bear the potential to guide the user's interaction with AI systems since they represent easy-to-understand explanations. To be applicable, CEs need to be realistic and actionable. In the literature, various methods have been proposed to generate CEs. However, the majority of research on CEs focuses on classification problems where questions like "What should I do to get my rejected loan approved?" are raised. In practice, answering questions like "What should I do to increase my salary?" are of a more regressive nature. In this paper, we introduce a novel method to generate CEs for a pre-trained regressor by first disentangling the label-relevant from the label-irrelevant dimensions in the latent space. CEs are then generated by combining the label-irrelevant dimensions and the predefined output. The intuition behind this approach is that the ideal counterfactual search should focus on the label-irrelevant characteristics of the input and suggest changes toward target-relevant characteristics. Searching in the latent space could help achieve this goal. We show that our method maintains the characteristics of the query sample during the counterfactual search. In various experiments, we demonstrate that the proposed method is competitive based on different quality measures on image and tabular datasets in regression problem settings. It efficiently returns results closer to the original data manifold compared to three state-of-the-art methods, which is essential for realistic high-dimensional machine learning applications. Our code will be made available as an open-source package upon the publication of this work.

NASep 29, 2017
Superconvergence Points For The Spectral Interpolation Of Riesz Fractional Derivatives

Beichuan Deng, Zhimin Zhang, Xuan Zhao

In this paper, superconvergence points are located for the approximation of the Riesz derivative of order $α$ using classical Lobatto-type polynomials when $α\in (0,1)$ and generalized Jacobi functions (GJF) for arbitrary $α> 0$, respectively. For the former, superconvergence points are zeros of the Riesz fractional derivative of the leading term in the truncated Legendre-Lobatto expansion. It is observed that the convergence rate for different $α$ at the superconvergence points is at least $O(N^{-2})$ better than the optimal global convergence rate. Furthermore, the interpolation is generalized to the Riesz derivative of order $α> 1$ with the help of GJF, which deal well with the singularities. The well-posedness, convergence and superconvergence properties are theoretically analyzed. The gain of the convergence rate at the superconvergence points is analyzed to be $O(N^{-(α+3)/2})$ for $α\in (0,1)$ and $O(N^{-2})$ for $α> 1$. Finally, we apply our findings in solving model FDEs and observe that the convergence rates are indeed much better at the predicted superconvergence points.

LGNov 21, 2023
Adversarial Reweighting Guided by Wasserstein Distance for Bias Mitigation

Xuan Zhao, Simone Fabbrizzi, Paula Reyero Lobo et al.

The unequal representation of different groups in a sample population can lead to discrimination of minority groups when machine learning models make automated decisions. To address these issues, fairness-aware machine learning jointly optimizes two (or more) metrics aiming at predictive effectiveness and low unfairness. However, the inherent under-representation of minorities in the data makes the disparate treatment of subpopulations less noticeable and difficult to deal with during learning. In this paper, we propose a novel adversarial reweighting method to address such \emph{representation bias}. To balance the data distribution between the majority and the minority groups, our approach deemphasizes samples from the majority group. To minimize empirical risk, our method prefers samples from the majority group that are close to the minority group as evaluated by the Wasserstein distance. Our theoretical analysis shows the effectiveness of our adversarial reweighting approach. Experiments demonstrate that our approach mitigates bias without sacrificing classification accuracy, outperforming related state-of-the-art methods on image and tabular benchmark datasets.

LGAug 26, 2024
Enhancing Fairness through Reweighting: A Path to Attain the Sufficiency Rule

Xuan Zhao, Klaus Broelemann, Salvatore Ruggieri et al.

We introduce an innovative approach to enhancing the empirical risk minimization (ERM) process in model training through a refined reweighting scheme of the training data to enhance fairness. This scheme aims to uphold the sufficiency rule in fairness by ensuring that optimal predictors maintain consistency across diverse sub-groups. We employ a bilevel formulation to address this challenge, wherein we explore sample reweighting strategies. Unlike conventional methods that hinge on model size, our formulation bases generalization complexity on the space of sample weights. We discretize the weights to improve training speed. Empirical validation of our method showcases its effectiveness and robustness, revealing a consistent improvement in the balance between prediction performance and fairness metrics across various experiments.

LGNov 17, 2023
Causal Fairness-Guided Dataset Reweighting using Neural Networks

Xuan Zhao, Klaus Broelemann, Salvatore Ruggieri et al.

The importance of achieving fairness in machine learning models cannot be overstated. Recent research has pointed out that fairness should be examined from a causal perspective, and several fairness notions based on the on Pearl's causal framework have been proposed. In this paper, we construct a reweighting scheme of datasets to address causal fairness. Our approach aims at mitigating bias by considering the causal relationships among variables and incorporating them into the reweighting process. The proposed method adopts two neural networks, whose structures are intentionally used to reflect the structures of a causal graph and of an interventional graph. The two neural networks can approximate the causal model of the data, and the causal model of interventions. Furthermore, reweighting guided by a discriminator is applied to achieve various fairness notions. Experiments on real-world datasets show that our method can achieve causal fairness on the data while remaining close to the original data for downstream tasks.

LGJul 25, 2023
Counterfactual Explanation via Search in Gaussian Mixture Distributed Latent Space

Xuan Zhao, Klaus Broelemann, Gjergji Kasneci

Counterfactual Explanations (CEs) are an important tool in Algorithmic Recourse for addressing two questions: 1. What are the crucial factors that led to an automated prediction/decision? 2. How can these factors be changed to achieve a more favorable outcome from a user's perspective? Thus, guiding the user's interaction with AI systems by proposing easy-to-understand explanations and easy-to-attain feasible changes is essential for the trustworthy adoption and long-term acceptance of AI systems. In the literature, various methods have been proposed to generate CEs, and different quality measures have been suggested to evaluate these methods. However, the generation of CEs is usually computationally expensive, and the resulting suggestions are unrealistic and thus non-actionable. In this paper, we introduce a new method to generate CEs for a pre-trained binary classifier by first shaping the latent space of an autoencoder to be a mixture of Gaussian distributions. CEs are then generated in latent space by linear interpolation between the query sample and the centroid of the target class. We show that our method maintains the characteristics of the input sample during the counterfactual search. In various experiments, we show that the proposed method is competitive based on different quality measures on image and tabular datasets -- efficiently returns results that are closer to the original data manifold compared to three state-of-the-art methods, which are essential for realistic high-dimensional machine learning applications.

LGDec 11, 2025
Classifier Reconstruction Through Counterfactual-Aware Wasserstein Prototypes

Xuan Zhao, Zhuo Cao, Arya Bangun et al.

Counterfactual explanations provide actionable insights by identifying minimal input changes required to achieve a desired model prediction. Beyond their interpretability benefits, counterfactuals can also be leveraged for model reconstruction, where a surrogate model is trained to replicate the behavior of a target model. In this work, we demonstrate that model reconstruction can be significantly improved by recognizing that counterfactuals, which typically lie close to the decision boundary, can serve as informative though less representative samples for both classes. This is particularly beneficial in settings with limited access to labeled data. We propose a method that integrates original data samples with counterfactuals to approximate class prototypes using the Wasserstein barycenter, thereby preserving the underlying distributional structure of each class. This approach enhances the quality of the surrogate model and mitigates the issue of decision boundary shift, which commonly arises when counterfactuals are naively treated as ordinary training instances. Empirical results across multiple datasets show that our method improves fidelity between the surrogate and target models, validating its effectiveness.

IVDec 7, 2025
Physics-Guided Diffusion Priors for Multi-Slice Reconstruction in Scientific Imaging

Laurentius Valdy, Richard D. Paul, Alessio Quercia et al.

Accurate multi-slice reconstruction from limited measurement data is crucial to speed up the acquisition process in medical and scientific imaging. However, it remains challenging due to the ill-posed nature of the problem and the high computational and memory demands. We propose a framework that addresses these challenges by integrating partitioned diffusion priors with physics-based constraints. By doing so, we substantially reduce memory usage per GPU while preserving high reconstruction quality, outperforming both physics-only and full multi-slice reconstruction baselines for different modalities, namely Magnetic Resonance Imaging (MRI) and four-dimensional Scanning Transmission Electron Microscopy (4D-STEM). Additionally, we show that the proposed method improves in-distribution accuracy as well as strong generalization to out-of-distribution datasets.

LGNov 10, 2025
FlowTIE: Flow-based Transport of Intensity Equation for Phase Gradient Estimation from 4D-STEM Data

Arya Bangun, Maximilian Töllner, Xuan Zhao et al.

We introduce FlowTIE, a neural-network-based framework for phase reconstruction from 4D-Scanning Transmission Electron Microscopy (STEM) data, which integrates the Transport of Intensity Equation (TIE) with a flow-based representation of the phase gradient. This formulation allows the model to bridge data-driven learning with physics-based priors, improving robustness under dynamical scattering conditions for thick specimen. The validation on simulated datasets of crystalline materials, benchmarking to classical TIE and gradient-based optimization methods are presented. The results demonstrate that FlowTIE improves phase reconstruction accuracy, fast, and can be integrated with a thick specimen model, namely multislice method.

IVMay 5, 2022
Invariant Content Synergistic Learning for Domain Generalization of Medical Image Segmentation

Yuxin Kang, Hansheng Li, Xuan Zhao et al.

While achieving remarkable success for medical image segmentation, deep convolution neural networks (DCNNs) often fail to maintain their robustness when confronting test data with the novel distribution. To address such a drawback, the inductive bias of DCNNs is recently well-recognized. Specifically, DCNNs exhibit an inductive bias towards image style (e.g., superficial texture) rather than invariant content (e.g., object shapes). In this paper, we propose a method, named Invariant Content Synergistic Learning (ICSL), to improve the generalization ability of DCNNs on unseen datasets by controlling the inductive bias. First, ICSL mixes the style of training instances to perturb the training distribution. That is to say, more diverse domains or styles would be made available for training DCNNs. Based on the perturbed distribution, we carefully design a dual-branches invariant content synergistic learning strategy to prevent style-biased predictions and focus more on the invariant content. Extensive experimental results on two typical medical image segmentation tasks show that our approach performs better than state-of-the-art domain generalization methods.

CVNov 29, 2023
Discovering Galaxy Features via Dataset Distillation

Haowen Guan, Xuan Zhao, Zishi Wang et al.

In many applications, Neural Nets (NNs) have classification performance on par or even exceeding human capacity. Moreover, it is likely that NNs leverage underlying features that might differ from those humans perceive to classify. Can we "reverse-engineer" pertinent features to enhance our scientific understanding? Here, we apply this idea to the notoriously difficult task of galaxy classification: NNs have reached high performance for this task, but what does a neural net (NN) "see" when it classifies galaxies? Are there morphological features that the human eye might overlook that could help with the task and provide new insights? Can we visualize tracers of early evolution, or additionally incorporated spectral data? We present a novel way to summarize and visualize galaxy morphology through the lens of neural networks, leveraging Dataset Distillation, a recent deep-learning methodology with the primary objective to distill knowledge from a large dataset and condense it into a compact synthetic dataset, such that a model trained on this synthetic dataset achieves performance comparable to a model trained on the full dataset. We curate a class-balanced, medium-size high-confidence version of the Galaxy Zoo 2 dataset, and proceed with dataset distillation from our accurate NN-classifier to create synthesized prototypical images of galaxy morphological features, demonstrating its effectiveness. Of independent interest, we introduce a self-adaptive version of the state-of-the-art Matching Trajectory algorithm to automate the distillation process, and show enhanced performance on computer vision benchmarks.

ROFeb 17, 2025Code
X-IL: Exploring the Design Space of Imitation Learning Policies

Xiaogang Jia, Atalay Donat, Xi Huang et al.

Designing modern imitation learning (IL) policies requires making numerous decisions, including the selection of feature encoding, architecture, policy representation, and more. As the field rapidly advances, the range of available options continues to grow, creating a vast and largely unexplored design space for IL policies. In this work, we present X-IL, an accessible open-source framework designed to systematically explore this design space. The framework's modular design enables seamless swapping of policy components, such as backbones (e.g., Transformer, Mamba, xLSTM) and policy optimization techniques (e.g., Score-matching, Flow-matching). This flexibility facilitates comprehensive experimentation and has led to the discovery of novel policy configurations that outperform existing methods on recent robot learning benchmarks. Our experiments demonstrate not only significant performance gains but also provide valuable insights into the strengths and weaknesses of various design choices. This study serves as both a practical reference for practitioners and a foundation for guiding future research in imitation learning.

69.2ROApr 23
How VLAs (Really) Work In Open-World Environments

Amir Rasouli, Yangzheng Wu, Zhiyuan Li et al.

Vision-language-action models (VLAs) have been extensively used in robotics applications, achieving great success in various manipulation problems. More recently, VLAs have been used in long-horizon tasks and evaluated on benchmarks, such as BEHAVIOR1K (B1K), for solving complex household chores. The common metric for measuring progress in such benchmarks is success rate or partial score based on satisfaction of progress-agnostic criteria, meaning only the final states of the objects are considered, regardless of the events that lead to such states. In this paper, we argue that using such evaluation protocols say little about safety aspects of operation and can potentially exaggerate reported performance, undermining core challenges for future real-world deployment. To this end, we conduct a thorough analysis of state-of-the-art models on the B1K Challenge and evaluate policies in terms of robustness via reproducibility and consistency of performance, safety aspects of policies operations, task awareness, and key elements leading to the incompletion of tasks. We then propose evaluation protocols to capture safety violations to better measure the true performance of the policies in more complex and interactive scenarios. At the end, we discuss the limitations of the existing VLAs and motivate future research.

LGMar 5, 2019Code
PROPS: Probabilistic personalization of black-box sequence models

Michael Thomas Wojnowicz, Xuan Zhao

We present PROPS, a lightweight transfer learning mechanism for sequential data. PROPS learns probabilistic perturbations around the predictions of one or more arbitrarily complex, pre-trained black box models (such as recurrent neural networks). The technique pins the black-box prediction functions to "source nodes" of a hidden Markov model (HMM), and uses the remaining nodes as "perturbation nodes" for learning customized perturbations around those predictions. In this paper, we describe the PROPS model, provide an algorithm for online learning of its parameters, and demonstrate the consistency of this estimation. We also explore the utility of PROPS in the context of personalized language modeling. In particular, we construct a baseline language model by training a LSTM on the entire Wikipedia corpus of 2.5 million articles (around 6.6 billion words), and then use PROPS to provide lightweight customization into a personalized language model of President Donald J. Trump's tweeting. We achieved good customization after only 2,000 additional words, and find that the PROPS model, being fully probabilistic, provides insight into when President Trump's speech departs from generic patterns in the Wikipedia corpus. Python code (for both the PROPS training algorithm as well as experiment reproducibility) is available at https://github.com/cylance/perturbed-sequence-model.

HCMay 22, 2024
"I Like Sunnie More Than I Expected!": Exploring User Expectation and Perception of an Anthropomorphic LLM-based Conversational Agent for Well-Being Support

Siyi Wu, Julie Y. A. Cachia, Feixue Han et al.

The human-computer interaction (HCI) research community has a longstanding interest in exploring the mismatch between users' actual experiences and expectation toward new technologies, for instance, large language models (LLMs). In this study, we compared users' (N = 38) initial expectations against their post-interaction perceptions of two LLM-powered mental well-being intervention activity recommendation systems. Both systems have a built-in LLM to recommend a personalized well-being intervention activity, but one system (Sunnie) has an anthropomorphic conversational interaction design via elements such as appearance, persona, and natural conversation. Results showed that user engagement was high with both systems, and both systems exceeded users' expectations along the utility dimension, highlighting AI's potential to offer useful intervention activity recommendations. In addition, Sunnie further outperformed the non-anthropomorphic baseline system in relational warmth. These findings suggest that anthropomorphic conversational interaction design may be particularly effective in fostering warmth in mental health support contexts.

CVApr 1, 2025
Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion

Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.

Identity-preserving face synthesis aims to generate synthetic face images of virtual subjects that can substitute real-world data for training face recognition models. While prior arts strive to create images with consistent identities and diverse styles, they face a trade-off between them. Identifying their limitation of treating style variation as subject-agnostic and observing that real-world persons actually have distinct, subject-specific styles, this paper introduces MorphFace, a diffusion-based face generator. The generator learns fine-grained facial styles, e.g., shape, pose and expression, from the renderings of a 3D morphable model (3DMM). It also learns identities from an off-the-shelf recognition model. To create virtual faces, the generator is conditioned on novel identities of unlabeled synthetic faces, and novel styles that are statistically sampled from a real-world prior distribution. The sampling especially accounts for both intra-subject variation and subject distinctiveness. A context blending strategy is employed to enhance the generator's responsiveness to identity and style conditions. Extensive experiments show that MorphFace outperforms the best prior arts in face recognition efficacy.

CVMar 15, 2025
CHOrD: Generation of Collision-Free, House-Scale, and Organized Digital Twins for 3D Indoor Scenes with Controllable Floor Plans and Optimal Layouts

Chong Su, Yingbin Fu, Zheyuan Hu et al. · cambridge

We introduce CHOrD, a novel framework for scalable synthesis of 3D indoor scenes, designed to create house-scale, collision-free, and hierarchically structured indoor digital twins. In contrast to existing methods that directly synthesize the scene layout as a scene graph or object list, CHOrD incorporates a 2D image-based intermediate layout representation, enabling effective prevention of collision artifacts by successfully capturing them as out-of-distribution (OOD) scenarios during generation. Furthermore, unlike existing methods, CHOrD is capable of generating scene layouts that adhere to complex floor plans with multi-modal controls, enabling the creation of coherent, house-wide layouts robust to both geometric and semantic variations in room structures. Additionally, we propose a novel dataset with expanded coverage of household items and room configurations, as well as significantly improved data quality. CHOrD demonstrates state-of-the-art performance on both the 3D-FRONT and our proposed datasets, delivering photorealistic, spatially coherent indoor scene synthesis adaptable to arbitrary floor plan variations.

CLJan 19
Pardon? Evaluating Conversational Repair in Large Audio-Language Models

Shuanghong Huang, Jinlei Xu, Youchao Zhou et al.

Large Audio-Language Models (LALMs) have demonstrated strong performance in spoken question answering (QA), with existing evaluations primarily focusing on answer accuracy and robustness to acoustic perturbations. However, such evaluations implicitly assume that spoken inputs remain semantically answerable, an assumption that often fails in real-world interaction when essential information is missing. In this work, we introduce a repair-aware evaluation setting that explicitly distinguishes between answerable and unanswerable audio inputs. We define answerability as a property of the input itself and construct paired evaluation conditions using a semantic-acoustic masking protocol. Based on this setting, we propose the Evaluability Awareness and Repair (EAR) score, a non-compensatory metric that jointly evaluates task competence under answerable conditions and repair behavior under unanswerable conditions. Experiments on two spoken QA benchmarks across diverse LALMs reveal a consistent gap between answer accuracy and conversational reliability: while many models perform well when inputs are answerable, most fail to recognize semantic unanswerability and initiate appropriate conversational repair. These findings expose a limitation of prevailing accuracy-centric evaluation practices and motivate reliability assessments that treat unanswerable inputs as cues for repair and continued interaction.

CVNov 18, 2025
GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation

Xuan Zhao, Zhongyu Zhang, Yuge Huang et al.

Existing state-of-the-art image tokenization methods leverage diverse semantic features from pre-trained vision models for additional supervision, to expand the distribution of latent representations and thereby improve the quality of image reconstruction and generation. These methods employ a locally supervised approach for semantic supervision, which limits the uniformity of semantic distribution. However, VA-VAE proves that a more uniform feature distribution yields better generation performance. In this work, we introduce a Global Perspective Tokenizer (GloTok), which utilizes global relational information to model a more uniform semantic distribution of tokenized features. Specifically, a codebook-wise histogram relation learning method is proposed to transfer the semantics, which are modeled by pre-trained models on the entire dataset, to the semantic codebook. Then, we design a residual learning module that recovers the fine-grained details to minimize the reconstruction error caused by quantization. Through the above design, GloTok delivers more uniformly distributed semantic latent representations, which facilitates the training of autoregressive (AR) models for generating high-quality images without requiring direct access to pre-trained models during the training process. Experiments on the standard ImageNet-1k benchmark clearly show that our proposed method achieves state-of-the-art reconstruction performance and generation quality.

RONov 27, 2025
Distracted Robot: How Visual Clutter Undermine Robotic Manipulation

Amir Rasouli, Montgomery Alban, Sajjad Pakdamansavoji et al.

In this work, we propose an evaluation protocol for examining the performance of robotic manipulation policies in cluttered scenes. Contrary to prior works, we approach evaluation from a psychophysical perspective, therefore we use a unified measure of clutter that accounts for environmental factors as well as the distractors quantity, characteristics, and arrangement. Using this measure, we systematically construct evaluation scenarios in both hyper-realistic simulation and real-world and conduct extensive experimentation on manipulation policies, in particular vision-language-action (VLA) models. Our experiments highlight the significant impact of scene clutter, lowering the performance of the policies, by as much as 34% and show that despite achieving similar average performance across the tasks, different VLA policies have unique vulnerabilities and a relatively low agreement on success scenarios. We further show that our clutter measure is an effective indicator of performance degradation and analyze the impact of distractors in terms of their quantity and occluding influence. At the end, we show that finetuning on enhanced data, although effective, does not equally remedy all negative impacts of clutter on performance.

RONov 27, 2025
CAPE: Context-Aware Diffusion Policy Via Proximal Mode Expansion for Collision Avoidance

Rui Heng Yang, Xuan Zhao, Leo Maxime Brunswic et al.

In robotics, diffusion models can capture multi-modal trajectories from demonstrations, making them a transformative approach in imitation learning. However, achieving optimal performance following this regiment requires a large-scale dataset, which is costly to obtain, especially for challenging tasks, such as collision avoidance. In those tasks, generalization at test time demands coverage of many obstacles types and their spatial configurations, which are impractical to acquire purely via data. To remedy this problem, we propose Context-Aware diffusion policy via Proximal mode Expansion (CAPE), a framework that expands trajectory distribution modes with context-aware prior and guidance at inference via a novel prior-seeded iterative guided refinement procedure. The framework generates an initial trajectory plan and executes a short prefix trajectory, and then the remaining trajectory segment is perturbed to an intermediate noise level, forming a trajectory prior. Such a prior is context-aware and preserves task intent. Repeating the process with context-aware guided denoising iteratively expands mode support to allow finding smoother, less collision-prone trajectories. For collision avoidance, CAPE expands trajectory distribution modes with collision-aware context, enabling the sampling of collision-free trajectories in previously unseen environments while maintaining goal consistency. We evaluate CAPE on diverse manipulation tasks in cluttered unseen simulated and real-world settings and show up to 26% and 80% higher success rates respectively compared to SOTA methods, demonstrating better generalization to unseen environments.

ROOct 24, 2025
Two-Steps Diffusion Policy for Robotic Manipulation via Genetic Denoising

Mateo Clemente, Leo Brunswic, Rui Heng Yang et al.

Diffusion models, such as diffusion policy, have achieved state-of-the-art results in robotic manipulation by imitating expert demonstrations. While diffusion models were originally developed for vision tasks like image and video generation, many of their inference strategies have been directly transferred to control domains without adaptation. In this work, we show that by tailoring the denoising process to the specific characteristics of embodied AI tasks -- particularly structured, low-dimensional nature of action distributions -- diffusion policies can operate effectively with as few as 5 neural function evaluations (NFE). Building on this insight, we propose a population-based sampling strategy, genetic denoising, which enhances both performance and stability by selecting denoising trajectories with low out-of-distribution risk. Our method solves challenging tasks with only 2 NFE while improving or matching performance. We evaluate our approach across 14 robotic manipulation tasks from D4RL and Robomimic, spanning multiple action horizons and inference budgets. In over 2 million evaluations, our method consistently outperforms standard diffusion-based policies, achieving up to 20\% performance gains with significantly fewer inference steps.

LGOct 16, 2025
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching

Zhuo Cao, Xuan Zhao, Lena Krieger et al.

The growing integration of machine learning (ML) and artificial intelligence (AI) models into high-stakes domains such as healthcare and scientific research calls for models that are not only accurate but also interpretable. Among the existing explainable methods, counterfactual explanations offer interpretability by identifying minimal changes to inputs that would alter a model's prediction, thus providing deeper insights. However, current counterfactual generation methods suffer from critical limitations, including gradient vanishing, discontinuous latent spaces, and an overreliance on the alignment between learned and true decision boundaries. To overcome these limitations, we propose LeapFactual, a novel counterfactual explanation algorithm based on conditional flow matching. LeapFactual generates reliable and informative counterfactuals, even when true and learned decision boundaries diverge. Following a model-agnostic approach, LeapFactual is not limited to models with differentiable loss functions. It can even handle human-in-the-loop systems, expanding the scope of counterfactual explanations to domains that require the participation of human annotators, such as citizen science. We provide extensive experiments on benchmark and real-world datasets showing that LeapFactual generates accurate and in-distribution counterfactual explanations that offer actionable insights. We observe, for instance, that our reliable counterfactual samples with labels aligning to ground truth can be beneficially used as new training data to enhance the model. The proposed method is broadly applicable and enhances both scientific knowledge discovery and non-expert interpretability.

CVOct 11, 2025
ImmerIris: A Large-Scale Dataset and Benchmark for Immersive Iris Recognition in Open Scenes

Yuxi Mi, Qiuyang Yuan, Zhizhou Zhong et al.

In egocentric applications such as augmented and virtual reality, immersive iris recognition is emerging as an accurate and seamless way to identify persons. While classic systems acquire iris images on-axis, i.e., via dedicated frontal sensors in controlled settings, the immersive setup primarily captures off-axis irises through tilt-placed headset cameras, with only mild control in open scenes. This yields unique challenges, including perspective distortion, intensified quality degradations, and intra-class variations in iris texture. Datasets capturing these challenges remain scarce. To fill this gap, this paper introduces ImmerIris, a large-scale dataset collected via VR headsets, containing 499,791 ocular images from 564 subjects. It is, to the best of current knowledge, the largest public dataset and among the first dedicated to off-axis acquisition. Based on ImmerIris, evaluation protocols are constructed to benchmark recognition methods under different challenging factors. Current methods, primarily designed for classic on-axis imagery, perform unsatisfactorily on the immersive setup, mainly due to reliance on fallible normalization. To this end, this paper further proposes a normalization-free paradigm that directly learns from ocular images with minimal adjustment. Despite its simplicity, this approach consistently outperforms normalization-based counterparts, pointing to a promising direction for robust immersive recognition.

AIAug 9, 2025
Natural Language-Driven Viewpoint Navigation for Volume Exploration via Semantic Block Representation

Xuan Zhao, Jun Tao

Exploring volumetric data is crucial for interpreting scientific datasets. However, selecting optimal viewpoints for effective navigation can be challenging, particularly for users without extensive domain expertise or familiarity with 3D navigation. In this paper, we propose a novel framework that leverages natural language interaction to enhance volumetric data exploration. Our approach encodes volumetric blocks to capture and differentiate underlying structures. It further incorporates a CLIP Score mechanism, which provides semantic information to the blocks to guide navigation. The navigation is empowered by a reinforcement learning framework that leverage these semantic cues to efficiently search for and identify desired viewpoints that align with the user's intent. The selected viewpoints are evaluated using CLIP Score to ensure that they best reflect the user queries. By automating viewpoint selection, our method improves the efficiency of volumetric data navigation and enhances the interpretability of complex scientific phenomena.

SYJan 29, 2025
Differentiable Projection-based Learn to Optimize in Wireless Network-Part I: Convex Constrained (Non-)Convex Programming

Xiucheng Wang, Xuan Zhao, Nan Cheng

This paper addresses a class of (non-)convex optimization problems subject to general convex constraints, which pose significant challenges for traditional methods due to their inherent non-convexity and diversity. Conventional convex optimization-based solvers often struggle to efficiently handle these problems in their most general form. While neural network (NN)-based approaches offer a promising alternative, ensuring the feasibility of NN-generated solutions and effectively training the NN remain key hurdles, largely because finite-capacity networks can produce infeasible outputs. To overcome these issues, we propose a projection-based method that projects any infeasible NN output onto the feasible domain, thus guaranteeing strict adherence to the constraints without compromising the NN's optimization capability. Furthermore, we derive the objective function values for both the raw NN outputs and their projected counterparts, along with the gradients of these values with respect to the NN parameters. This derivation enables label-free (unsupervised) training, reducing reliance on labeled data and improving scalability. Experimental results demonstrate that the proposed projection-based method consistently ensures feasibility.

IVMay 2, 2023
High-Fidelity Image Synthesis from Pulmonary Nodule Lesion Maps using Semantic Diffusion Model

Xuan Zhao, Benjamin Hou

Lung cancer has been one of the leading causes of cancer-related deaths worldwide for years. With the emergence of deep learning, computer-assisted diagnosis (CAD) models based on learning algorithms can accelerate the nodule screening process, providing valuable assistance to radiologists in their daily clinical workflows. However, developing such robust and accurate models often requires large-scale and diverse medical datasets with high-quality annotations. Generating synthetic data provides a pathway for augmenting datasets at a larger scale. Therefore, in this paper, we explore the use of Semantic Diffusion Mod- els (SDM) to generate high-fidelity pulmonary CT images from segmentation maps. We utilize annotation information from the LUNA16 dataset to create paired CT images and masks, and assess the quality of the generated images using the Frechet Inception Distance (FID), as well as on two common clinical downstream tasks: nodule detection and nodule localization. Achieving improvements of 3.96% for detection accuracy and 8.50% for AP50 in nodule localization task, respectively, demonstrates the feasibility of the approach.

HCFeb 12, 2022
"I Don't Want People to Look At Me Differently": Designing User-Defined Above-the-Neck Gestures for People with Upper Body Motor Impairments

Xuan Zhao, Mingming Fan, Teng Han

Recent research proposed eyelid gestures for people with upper-body motor impairments (UMI) to interact with smartphones without finger touch. However, such eyelid gestures were designed by researchers. It remains unknown what eyelid gestures people with UMI would want and be able to perform. Moreover, other above-the-neck body parts (e.g., mouth, head) could be used to form more gestures. We conducted a user study in which 17 people with UMI designed above-the-neck gestures for 26 common commands on smartphones. We collected a total of 442 user-defined gestures involving the eyes, the mouth, and the head. Participants were more likely to make gestures with their eyes and preferred gestures that were simple, easy-to-remember, and less likely to draw attention from others. We further conducted a survey (N=24) to validate the usability and acceptance of these user-defined gestures. Results show that user-defined gestures were acceptable to both people with and without motor impairments.

CVNov 25, 2021
Homogeneous Low-Resolution Face Recognition Method based Correlation Features

Xuan Zhao

Face recognition technology has been widely adopted in many mission-critical scenarios like means of human identification, controlled admission, and mobile device access, etc. Security surveillance is a typical scenario of face recognition technology. Because the low-resolution feature of surveillance video and images makes it difficult for high-resolution face recognition algorithms to extract effective feature information, Algorithms applied to high-resolution face recognition are difficult to migrate directly to low-resolution situations. As face recognition in security surveillance becomes more important in the era of dense urbanization, it is essential to develop algorithms that are able to provide satisfactory performance in processing the video frames generated by low-resolution surveillance cameras. This paper study on the Correlation Features-based Face Recognition (CoFFaR) method which using for homogeneous low-resolution surveillance videos, the theory, experimental details, and experimental results are elaborated in detail. The experimental results validate the effectiveness of the correlation features method that improves the accuracy of homogeneous face recognition in surveillance security scenarios.

AIJul 21, 2021
Reinforcement Learning Agent Training with Goals for Real World Tasks

Xuan Zhao, Marcos Campos

Reinforcement Learning (RL) is a promising approach for solving various control, optimization, and sequential decision making tasks. However, designing reward functions for complex tasks (e.g., with multiple objectives and safety constraints) can be challenging for most users and usually requires multiple expensive trials (reward function hacking). In this paper we propose a specification language (Inkling Goal Specification) for complex control and optimization tasks, which is very close to natural language and allows a practitioner to focus on problem specification instead of reward function hacking. The core elements of our framework are: (i) mapping the high level language to a predicate temporal logic tailored to control and optimization tasks, (ii) a novel automaton-guided dense reward generation that can be used to drive RL algorithms, and (iii) a set of performance metrics to assess the behavior of the system. We include a set of experiments showing that the proposed method provides great ease of use to specify a wide range of real world tasks; and that the reward generated is able to drive the policy training to achieve the specified goal.

HCFeb 21, 2021
EvoK: Connecting loved ones through Heart Rate sharing

Esha Shandilya, Yiwen Wang, Xuan Zhao et al.

In this work, we present EvoK, a new way of sharing one's heart rate with feedback from their close contacts to alleviate social isolation and loneliness. EvoK consists of a pair of wearable prototype devices (i.e., sender and receiver). The sender is designed as a headband enabling continuous sensing of heart rate with aesthetic designs to maximize social acceptance. The receiver is designed as a wristwatch enabling unobtrusive receiving of the loved one's continuous heart rate with multi-modal notification systems.

ROAug 20, 2020
Autonomous Social Distancing in Urban Environments using a Quadruped Robot

Tingxiang Fan, Zhiming Chen, Xuan Zhao et al.

COVID-19 pandemic has become a global challenge faced by people all over the world. Social distancing has been proved to be an effective practice to reduce the spread of COVID-19. Against this backdrop, we propose that the surveillance robots can not only monitor but also promote social distancing. Robots can be flexibly deployed and they can take precautionary actions to remind people of practicing social distancing. In this paper, we introduce a fully autonomous surveillance robot based on a quadruped platform that can promote social distancing in complex urban environments. Specifically, to achieve autonomy, we mount multiple cameras and a 3D LiDAR on the legged robot. The robot then uses an onboard real-time social distancing detection system to track nearby pedestrian groups. Next, the robot uses a crowd-aware navigation algorithm to move freely in highly dynamic scenarios. The robot finally uses a crowd-aware routing algorithm to effectively promote social distancing by using human-friendly verbal cues to send suggestions to over-crowded pedestrians. We demonstrate and validate that our robot can be operated autonomously by conducting several experiments in various urban scenarios.

CVMay 7, 2020
NTIRE 2020 Challenge on NonHomogeneous Dehazing

Codruta O. Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu et al.

This paper reviews the NTIRE 2020 Challenge on NonHomogeneous Dehazing of images (restoration of rich details in hazy image). We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor. NH-Haze is the first realistic nonhomogeneous haze dataset that provides ground truth images. The nonhomogeneous haze has been produced using a professional haze generator that imitates the real conditions of haze scenes. 168 participants registered in the challenge and 27 teams competed in the final testing phase. The proposed solutions gauge the state-of-the-art in image dehazing.

CVMay 6, 2020
NTIRE 2020 Challenge on Image Demoireing: Methods and Results

Shanxin Yuan, Radu Timofte, Ales Leonardis et al.

This paper reviews the Challenge on Image Demoireing that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2020. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. The challenge was divided into two tracks. Track 1 targeted the single image demoireing problem, which seeks to remove moire patterns from a single image. Track 2 focused on the burst demoireing problem, where a set of degraded moire images of the same scene were provided as input, with the goal of producing a single demoired image as output. The methods were ranked in terms of their fidelity, measured using the peak signal-to-noise ratio (PSNR) between the ground truth clean images and the restored images produced by the participants' methods. The tracks had 142 and 99 registered participants, respectively, with a total of 14 and 6 submissions in the final testing stage. The entries span the current state-of-the-art in image and burst image demoireing problems.

SYOct 10, 2019
A Gradual Takeover Strategy of the Active Safety System

Rui Liu, Xichan Zhu, Xuan Zhao et al.

A gradual takeover strategy is proposed, in which the dynamic driving privilege assignment in real-time and the driving privilege gradual handover are realized. Firstly, the driving privilege assignment based on the risk level is achieved. The naturalistic driving data is applied to study the driver behavior during danger. TTC (time to collision) is defined as an obvious risk measure, whereas the time before the host vehicle has to brake assuming that the target vehicle is braking is defined as the potential risk measure, i.e. the time margin (TM). A risk assessment algorithm is proposed based on the obvious risk and potential risk. Secondly, the driving privilege gradual handover is realized. The non-cooperative MPC (model predictive control) is employed to resolve the conflicts between the driver and active safety system. The naturalistic driving data are applied to verify the effectiveness of the risk assessment algorithm, and the risk assessment algorithm performs better than TTC in the ROC (receiver operating characteristic). It is identified that the Nash equilibrium of the non-cooperative MPC can be achieved by using a non-iterative method. The driving privilege gradual handover is realized by using the confidence matrixes updating. The simulation verification shows that the gradual takeover strategy can achieve the driving privilege gradual handover between the driver and active safety system.

ROJul 3, 2019
Statistical Characteristics of Driver Acceleration Behavior and Its Probability Model

Rui Liu, Xuan Zhao, Xichan Zhu et al.

Naturalistic driving data were applied to study driver acceleration behaviour, and a probability model of the driver was proposed. First, the question of whether the database is large enough is resolved using kernel density estimation and Kullback-Liebler divergence. Next, the convergence database is utilised to achieve the bivariate acceleration distribution pattern. Subsequently, two probability models are proposed to explain the pattern. Finally, the statistical characteristics of the acceleration behaviours are studied to verify the probability models. The longitudinal and lateral acceleration behaviours always approximate a similar Pareto distribution. The braking, accelerating, and steering manoeuvres become more intense at first and then less intense as the velocity increases. These behaviours characteristics reveal the mechanism of the quadrangle bivariate acceleration distribution pattern. The bivariate acceleration behaviour of the driver will never reach a circle-shaped pattern. The bivariate Pareto distribution model can be applied to describe the bivariate acceleration behaviour of the driver.

MLJan 3, 2019
Projecting "better than randomly": How to reduce the dimensionality of very large datasets in a way that outperforms random projections

Michael Wojnowicz, Di Zhang, Glenn Chisholm et al.

For very large datasets, random projections (RP) have become the tool of choice for dimensionality reduction. This is due to the computational complexity of principal component analysis. However, the recent development of randomized principal component analysis (RPCA) has opened up the possibility of obtaining approximate principal components on very large datasets. In this paper, we compare the performance of RPCA and RP in dimensionality reduction for supervised learning. In Experiment 1, study a malware classification task on a dataset with over 10 million samples, almost 100,000 features, and over 25 billion non-zero values, with the goal of reducing the dimensionality to a compressed representation of 5,000 features. In order to apply RPCA to this dataset, we develop a new algorithm called large sample RPCA (LS-RPCA), which extends the RPCA algorithm to work on datasets with arbitrarily many samples. We find that classification performance is much higher when using LS-RPCA for dimensionality reduction than when using random projections. In particular, across a range of target dimensionalities, we find that using LS-RPCA reduces classification error by between 37% and 54%. Experiment 2 generalizes the phenomenon to multiple datasets, feature representations, and classifiers. These findings have implications for a large number of research projects in which random projections were used as a preprocessing step for dimensionality reduction. As long as accuracy is at a premium and the target dimensionality is sufficiently less than the numeric rank of the dataset, randomized PCA may be a superior choice. Moreover, if the dataset has a large number of samples, then LS-RPCA will provide a method for obtaining the approximate principal components.

ROSep 26, 2018
Robust Shape Estimation for 3D Deformable Object Manipulation

Tao Han, Xuan Zhao, Peigen Sun et al.

Existing shape estimation methods for deformable object manipulation suffer from the drawbacks of being off-line, model dependent, noise-sensitive or occlusion-sensitive, and thus are not appropriate for manipulation tasks requiring high precision. In this paper, we present a real-time shape estimation approach for autonomous robotic manipulation of 3D deformable objects. Our method fulfills all the requirements necessary for the high-quality deformable object manipulation in terms of being real-time, model-free and robust to noise and occlusion. These advantages are accomplished using a joint tracking and reconstruction framework, in which we track the object deformation by aligning a reference shape model with the stream input from the RGB-D camera, and simultaneously upgrade the reference shape model according to the newly captured RGB-D data. We have evaluated the quality and robustness of our real-time shape estimation pipeline on a set of deformable manipulation tasks implemented on physical robots. Videos are available at https://lifeisfantastic.github.io/DeformShapeEst/

ROSep 12, 2018
Dynamic Interaction Probabilistic Movement Primitives

Shuangda Duan, Longxin Chen, Hongmin Wu et al.

Human-robot collaboration is on the rise. Robots need to increasingly improve the efficiency and smoothness with which they assist humans by properly anticipating a human's intention. To do so, prediction models need to increase their accuracy and responsiveness. This work builds on top of Interaction Movement Primitives with phase estimation and re-formulates the framework to use dynamic human-motion observations which constantly update anticipatory motions. The original framework only considers a single fixed-duration static human observation which is used to perform only one anticipatory motion. Dynamic observations, with built-in phase estimation, yield a series of updated robot motion distributions. Co-activation is performed between the existing and newest most probably robot motion distribution. This results in smooth anticipatory robot motions that are highly accurate and with enhanced responsiveness.

ROJul 20, 2018
Considering Human Behavior in Motion Planning for Smooth Human-Robot Collaboration in Close Proximity

Xuan Zhao, Jia Pan

It is well-known that a deep understanding of co-workers' behavior and preference is important for collaboration effectiveness. In this work, we present a method to accomplish smooth human-robot collaboration in close proximity by taking into account the human's behavior while planning the robot's trajectory. In particular, we first use an occupancy map to summarize human's movement preference over time, and such prior information is then considered in an optimization-based motion planner via two cost items as introduced in [1]: 1) avoidance of the workspace previously occupied by human, to eliminate the interruption and to increase the task success rate; 2) tendency to keep a safe distance between the human and the robot to improve the safety. In the experiments, we compare the collaboration performance among planners using different combinations of human-aware cost items, including the avoidance factor, both the avoidance and safe distance factor, and a baseline where no human-related factors are considered. The trajectories generated are tested in both simulated and real-world environments, and the results show that our method can significantly increase the collaborative task success rates and is also human-friendly. Our experimental results also show that the cost functions need to be adjusted in a task specific manner to better reflect human's preference.

LGJul 16, 2018
On the Information Theoretic Distance Measures and Bidirectional Helmholtz Machines

Mahdi Azarafrooz, Xuan Zhao, Sepehr Akhavan-Masouleh

By establishing a connection between bi-directional Helmholtz machines and information theory, we propose a generalized Helmholtz machine. Theoretical and experimental results show that given \textit{shallow} architectures, the generalized model outperforms the previous ones substantially.

MLSep 21, 2017
Lazy stochastic principal component analysis

Michael Wojnowicz, Dinh Nguyen, Li Li et al.

Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is executed lazily or not. Empirical studies find downstream predictive performance to be identical for both methods, and superior to random projections, across a range of predictive models (linear regression, logistic lasso, and random forests). In our largest experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix multiplications, besides an operation on a small square matrix whose size depends only on the target dimensionality.

CVJul 20, 2017
Generalized Convolutional Neural Networks for Point Cloud Data

Aleksandr Savchenkov, Andrew Davis, Xuan Zhao

The introduction of cheap RGB-D cameras, stereo cameras, and LIDAR devices has given the computer vision community 3D information that conventional RGB cameras cannot provide. This data is often stored as a point cloud. In this paper, we present a novel method to apply the concept of convolutional neural networks to this type of data. By creating a mapping of nearest neighbors in a dataset, and individually applying weights to spatial relationships between points, we achieve an architecture that works directly with point clouds, but closely resembles a convolutional neural net in both design and behavior. Such a method bypasses the need for extensive feature engineering, while proving to be computationally efficient and requiring few parameters.

MLNov 17, 2016
"Influence Sketching": Finding Influential Samples In Large-Scale Regressions

Mike Wojnowicz, Ben Cruz, Xuan Zhao et al.

There is an especially strong need in modern large-scale data analysis to prioritize samples for manual inspection. For example, the inspection could target important mislabeled samples or key vulnerabilities exploitable by an adversarial attack. In order to solve the "needle in the haystack" problem of which samples to inspect, we develop a new scalable version of Cook's distance, a classical statistical technique for identifying samples which unusually strongly impact the fit of a regression model (and its downstream predictions). In order to scale this technique up to very large and high-dimensional datasets, we introduce a new algorithm which we call "influence sketching." Influence sketching embeds random projections within the influence computation; in particular, the influence score is calculated using the randomly projected pseudo-dataset from the post-convergence Generalized Linear Model (GLM). We validate that influence sketching can reliably and successfully discover influential samples by applying the technique to a malware detection dataset of over 2 million executable files, each represented with almost 100,000 features. For example, we find that randomly deleting approximately 10% of training samples reduces predictive accuracy only slightly from 99.47% to 99.45%, whereas deleting the same number of samples with high influence sketch scores reduces predictive accuracy all the way down to 90.24%. Moreover, we find that influential samples are especially likely to be mislabeled. In the case study, we manually inspect the most influential samples, and find that influence sketching pointed us to new, previously unidentified pieces of malware.

CRJul 18, 2016
Wavelet decomposition of software entropy reveals symptoms of malicious code

Michael Wojnowicz, Glenn Chisholm, Matt Wolff et al.

Sophisticated malware authors can sneak hidden malicious code into portable executable files, and this code can be hard to detect, especially if encrypted or compressed. However, when an executable file switches between code regimes (e.g. native, encrypted, compressed, text, and padding), there are corresponding shifts in the file's representation as an entropy signal. In this paper, we develop a method for automatically quantifying the extent to which patterned variations in a file's entropy signal make it "suspicious." In Experiment 1, we use wavelet transforms to define a Suspiciously Structured Entropic Change Score (SSECS), a scalar feature that quantifies the suspiciousness of a file based on its distribution of entropic energy across multiple levels of spatial resolution. Based on this single feature, it was possible to raise predictive accuracy on a malware detection task from 50.0% to 68.7%, even though the single feature was applied to a heterogeneous corpus of malware discovered "in the wild." In Experiment 2, we describe how wavelet-based decompositions of software entropy can be applied to a parasitic malware detection task involving large numbers of samples and features. By extracting only string and entropy features (with wavelet decompositions) from software samples, we are able to obtain almost 99% detection of parasitic malware with fewer than 1% false positives on good files. Moreover, the addition of wavelet-based features uniformly improved detection performance across plausible false positive rates, both in a strings-only model (e.g., from 80.90% to 82.97%) and a strings-plus-entropy model (e.g. from 92.10% to 94.74%, and from 98.63% to 98.90%). Overall, wavelet decomposition of software entropy can be useful for machine learning models for detecting malware based on extracting millions of features from executable files.

NAMar 24, 2015
Superconvergence points of fractional spectral interpolation

Xuan Zhao, Zhimin Zhang

We investigate superconvergence properties of the spectral interpolation involving fractional derivatives. Our interest in this superconvergence problem is, in fact, twofold: when interpolating function values, we identify the points at which fractional derivatives of the interpolant superconverge; when interpolating fractional derivatives, we locate those points where function values of the interpolant superconverge. For the former case, we apply various Legendre polynomials as basis functions and obtain the superconvergence points, which naturally unify the superconvergence points for the first order derivative presented in [Z. Zhang, SIAM J. Numer. Anal., 50 (2012), 2966-2985], depending on orders of fractional derivatives. While for the latter case, we utilize Petrov-Galerkin method based on generalized Jacobi functions (GJF) [S. Chen et al., arXiv: 1407. 8303v1] and locate the superconvergence points both for function values and fractional derivatives. Numerical examples are provided to verify the analysis of superconvergence points for each case.