Yan Jiang

CV
h-index15
41papers
712citations
Novelty47%
AI Score58

41 Papers

CVMar 30, 2023Code
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

Wen Wang, Yan Jiang, Kangyang Xie et al.

Large-scale text-to-image diffusion models achieve unprecedented success in image generation and editing. However, how to extend such success to video editing is unclear. Recent initial attempts at video editing require significant text-to-video data and computation resources for training, which is often not accessible. In this work, we propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos. Code is made available at \url{https://github.com/baaivision/vid2vid-zero}.

CLSep 12, 2023Code
Balanced and Explainable Social Media Analysis for Public Health with Large Language Models

Yan Jiang, Ruihong Qiu, Yi Zhang et al.

As social media becomes increasingly popular, more and more public health activities emerge, which is worth noting for pandemic monitoring and government decision-making. Current techniques for public health analysis involve popular models such as BERT and large language models (LLMs). Although recent progress in LLMs has shown a strong ability to comprehend knowledge by being fine-tuned on specific domain datasets, the costs of training an in-domain LLM for every specific public health task are especially expensive. Furthermore, such kinds of in-domain datasets from social media are generally highly imbalanced, which will hinder the efficiency of LLMs tuning. To tackle these challenges, the data imbalance issue can be overcome by sophisticated data augmentation methods for social media datasets. In addition, the ability of the LLMs can be effectively utilised by prompting the model properly. In light of the above discussion, in this paper, a novel ALEX framework is proposed for social media analysis on public health. Specifically, an augmentation pipeline is developed to resolve the data imbalance issue. Furthermore, an LLMs explanation mechanism is proposed by prompting an LLM with the predicted results from BERT models. Extensive experiments conducted on three tasks at the Social Media Mining for Health 2023 (SMM4H) competition with the first ranking in two tasks demonstrate the superior performance of the proposed ALEX method. Our code has been released in https://github.com/YanJiangJerry/ALEX.

60.6CRMay 25
False Reality: Uncovering Sensor-induced Human-VR Interaction Vulnerability

Yancheng Jiang, Yan Jiang, Ruochen Zhou et al.

Virtual Reality (VR) techniques, serving as the bridge between the real and virtual worlds, have boomed and are widely used in manufacturing, remote healthcare, gaming, etc. Specifically, VR systems offer users immersive experiences that include both perceptions and actions. Various studies have demonstrated that attackers can manipulate VR software to influence users' interactions, including perception and actions. However, such attacks typically require strong access and specialized expertise. In this paper, we are the first to present a systematic analysis of physical attacks against VR systems and introduce False Reality, a new attack threat to VR devices without requiring access to or modification of their software. False Reality disturbs VR system services by tampering with sensor measurements, and further spoofing users' perception even inducing harmful actions, e.g., inducing dizziness or causing users to crash into obstacles, by exploiting perceptual and psychological effects. We formalize these threats through an attack pathway framework and validate three representative pathways via physical experiments and user studies on five commercial VR devices. Finally, we further propose a defense prototype to mitigate such threats. Our findings shall provide valuable insights for enhancing the security and resilience of future VR systems.

NAJan 30, 2018
Kernel Based High Order "Explicit" Unconditionally-Stable Scheme for Nonlinear Degenerate Advection-Diffusion Equations

Andrew Christlieb, Wei Guo, Yan Jiang

In this paper, we present a novel numerical scheme for solving a class of nonlinear degenerate parabolic equations with non-smooth solutions. The proposed method relies on a special kernel based formulation of the solutions found in our early work on the method of lines transpose and successive convolution. In such a framework, a high order weighted essentially non-oscillatory (WENO) methodology and a nonlinear filter are further employed to avoid spurious oscillations. High order accuracy in time is realized by using the high order explicit strong-stability-preserving (SSP) Runge-Kutta method. Moreover, theoretical investigations of the kernel based formulation combined with an explicit SSP method indicates that the combined scheme is unconditionally stable and up to third order accuracy. Evaluation of the kernel based approach is done with a fast $\mathcal{O}(N)$ summation algorithm. The new method allows for much larger time step evolution compared with other explicit schemes with the same order accuracy, leading to remarkable computational savings.

NAJul 1, 2016
A WENO-based Method of Line Transpose Approach for Vlasov Simulations

Andrew Christlieb, Wei Guo, Yan Jiang

In this paper, a high order implicit Method of Line Transpose (MOL$^T$ ) method based on a weighted essentially non-oscillatory (WENO) methodology is developed for one-dimensional linear transport equations and further applied to the Vlasov-Poisson (VP) simulations via dimensional splitting. In the MOL$^T$ framework, the time variable is first discretized by a diagonally implicit strong-stability-preserving Runge-Kutta method, resulting in a boundary value problem (BVP) at the discrete time levels. Then an integral formulation coupled with a high order WENO methodology is employed to solve the BVP. As a result, the proposed scheme is high order accurate in both space and time and free of oscillations even though the solution is discontinuous or has sharp gradients. Moreover, the scheme is able to take larger time step evolution compared with an explicit MOL WENO scheme with the same order of accuracy. The desired positivity-preserving (PP) property of the scheme is further attained by incorporating a newly proposed high order PP limiter. We perform numerical experiments on several benchmarks including linear advection, solid body rotation problem; and on the Landau damping, two-stream instabilities, bump-on-tail, and plasma sheath by solving the VP system. The efficacy and efficiency of the proposed scheme is numerically verified.

NAFeb 2, 2018
A Kernel Based High Order "Explicit" Unconditionally Stable Scheme for Time Dependent Hamilton-Jacobi Equations

Andrew Christlieb, Wei Guo, Yan Jiang

In this paper, a class of high order numerical schemes is proposed for solving Hamilton-Jacobi (H-J) equations. This work is regarded as an extension of our previous work for nonlinear degenerate parabolic equations, see Christlieb et al. \emph{arXiv preprint arXiv:1707.09294},, which relies on a special kernel-based formulation of the solutions and successive convolution. When applied to the H-J equations, the newly proposed scheme attains genuinely high order accuracy in both space and time, and more importantly, it is unconditionally stable, hence allowing for much larger time step evolution compared with other explicit schemes and saving computational cost. A high order weighted essentially non-oscillatory methodology and a novel nonlinear filter are further incorporated to capture the correct viscosity solution. Furthermore, by coupling the recently proposed inverse Lax-Wendroff boundary treatment technique, this method is very flexible in handing complex geometry as well as general boundary conditions. We perform numerical experiments on a collection of numerical examples, including H-J equations with linear, nonlinear, convex or non-convex Hamiltonians. The efficacy and efficiency of the proposed scheme in approximating the viscosity solution of general H-J equations is verified.

CLJun 27, 2022
Few-Shot Stance Detection via Target-Aware Prompt Distillation

Yan Jiang, Jinhua Gao, Huawei Shen et al.

Stance detection aims to identify whether the author of a text is in favor of, against, or neutral to a given target. The main challenge of this task comes two-fold: few-shot learning resulting from the varying targets and the lack of contextual information of the targets. Existing works mainly focus on solving the second issue by designing attention-based models or introducing noisy external knowledge, while the first issue remains under-explored. In this paper, inspired by the potential capability of pre-trained language models (PLMs) serving as knowledge bases and few-shot learners, we propose to introduce prompt-based fine-tuning for stance detection. PLMs can provide essential contextual information for the targets and enable few-shot learning via prompts. Considering the crucial role of the target in stance detection task, we design target-aware prompts and propose a novel verbalizer. Instead of mapping each label to a concrete word, our verbalizer maps each label to a vector and picks the label that best captures the correlation between the stance and the target. Moreover, to alleviate the possible defect of dealing with varying targets with a single hand-crafted prompt, we propose to distill the information learned from multiple prompts. Experimental results show the superior performance of our proposed model in both full-data and few-shot scenarios.

CLSep 8, 2023Code
UQ at #SMM4H 2023: ALEX for Public Health Analysis with Social Media

Yan Jiang, Ruihong Qiu, Yi Zhang et al.

As social media becomes increasingly popular, more and more activities related to public health emerge. Current techniques for public health analysis involve popular models such as BERT and large language models (LLMs). However, the costs of training in-domain LLMs for public health are especially expensive. Furthermore, such kinds of in-domain datasets from social media are generally imbalanced. To tackle these challenges, the data imbalance issue can be overcome by data augmentation and balanced training. Moreover, the ability of the LLMs can be effectively utilized by prompting the model properly. In this paper, a novel ALEX framework is proposed to improve the performance of public health analysis on social media by adopting an LLMs explanation mechanism. Results show that our ALEX model got the best performance among all submissions in both Task 2 and Task 4 with a high score in Task 1 in Social Media Mining for Health 2023 (SMM4H)[1]. Our code has been released at https:// github.com/YanJiangJerry/ALEX.

SYMay 1, 2017
Performance tradeoffs of dynamically controlled grid-connected inverters in low inertia power systems

Yan Jiang, Richard Pates, Enrique Mallada

Implementing frequency response using grid-connected inverters is one of the popular proposed alternatives to mitigate the dynamic degradation experienced in low inertia power systems. However, such solution faces several challenges as inverters do not intrinsically possess the natural response to power fluctuations that synchronous generators have. Thus, to synthetically generate this response, inverters need to take frequency measurements, which are usually noisy, and subsequently make changes in the output power, which are therefore delayed. This paper explores the system-wide performance tradeoffs that arise when measurement noise, power disturbances, and delayed actions are considered in the design of dynamic controllers for grid-connected inverters. Using a recently proposed dynamic droop (iDroop) control for grid-connected inverters, which is inspired by classical first order lead-lag compensation, we show that the sets of parameters that result in highest noise attenuation, power disturbance mitigation, and delay robustness do not necessarily have a common intersection. In particular, lead compensation is desired in systems where power disturbances are the predominant source of degradation, while lag compensation is a better alternative when the system is dominated by delays or frequency noise. Our analysis further shows that iDroop can outperform the standard droop alternative in both joint noise and disturbance mitigation, and delay robustness.

NAApr 12, 2017
Energy Stable Discontinuous Galerkin Methods for Maxwell's Equations in Nonlinear Optical Media

Vrushali A. Bokil, Yingda Cheng, Yan Jiang et al.

The propagation of electromagnetic waves in general media is modeled by the time-dependent Maxwell's partial differential equations (PDEs), coupled with constitutive laws that describe the response of the media. In this work, we focus on nonlinear optical media whose response is modeled by a system of first order nonlinear ordinary differential equations (ODEs), which include a single resonance linear Lorentz dispersion, and the nonlinearity comes from the instantaneous electronic Kerr response and the residual Raman molecular vibrational response. To design efficient, accurate, and stable computational methods, we apply high order discontinuous Galerkin discretizations in space to the hybrid PDE-ODE Maxwell system with several choices of numerical fluxes, and the resulting semi-discrete methods are shown to be energy stable. Under some restrictions on the strength of the nonlinearity, error estimates are also established. When we turn to fully discrete methods, the challenge to achieve provable stability lies in the temporal discretizations of the nonlinear terms. To overcome this, novel strategies are proposed to treat the nonlinearity in our model within the framework of the second-order leap-frog and implicit trapezoidal time integrators. The performance of the overall algorithms are demonstrated through numerical simulations of kink and antikink waves, and third-harmonic generation in soliton propagation.

NAJul 9, 2018
A high-order finite difference WENO scheme for ideal magnetohydrodynamics on curvilinear meshes

Andrew J. Christlieb, Xiao Feng, Yan Jiang et al.

A high-order finite difference numerical scheme is developed for the ideal magnetohydrodynamic equations based on an alternative flux formulation of the weighted essentially non-oscillatory (WENO) scheme. It computes a high-order numerical flux by a Taylor expansion in space, with the lowest-order term solved from a Riemann solver and the higher-order terms constructed from physical fluxes by limited central differences. The scheme coupled with several Riemann solvers, including a Lax-Friedrichs solver and HLL-type solvers, is developed on general curvilinear meshes in two dimensions and verified on a number of benchmark problems. In particular, a HLLD solver on Cartesian meshes is extended to curvilinear meshes with proper modifications. A numerical boundary condition for the perfect electrical conductor (PEC) boundary is derived for general geometry and verified through a bow shock flow. Numerical results also confirm the advantages of using low dissipative Riemann solvers in the current framework.

72.4LGMay 14Code
GFMate: Empowering Graph Foundation Models with Test-time Prompt Tuning

Yan Jiang, Ruihong Qiu, Zi Huang

Graph prompt tuning has shown great potential in graph learning by introducing trainable prompts to enhance the model performance in conventional single-domain scenarios. Recent research has extended graph prompts to improve Graph Foundation Models (GFMs) by few-shot tuning auxiliary prompts. Despite their progress, most existing methods embed source-domain information into prompts, which serve either as input to GFMs or encoded during model pre-training. Such prompt entanglement with specific source domains and GFM pre-training strategy restricts their generalisability to other domains and different GFMs. Furthermore, existing GFM prompts merely rely on few-shot tuning for adaptation, neglecting the rich information in unlabelled target domain test data. Motivated by these insights, this paper aims to empower GFMs with pre-training-agnostic test-time graph prompt tuning, named GFMate. GFMate introduces centroid and layer prompts applied after pre-training on target domains, avoiding entanglement with specific source domains and model pre-training. In addition, a test-time complementary learning objective is devised to exploit both labelled and unlabelled target domain data for effective test-time prompt tuning. Extensive experiments on 12 benchmark datasets demonstrate the superior performance and efficiency of GFMate, achieving improvements of up to 30.63%. Code is available at https://github.com/YanJiangJerry/GFMate.

95.6LGMay 12Code
Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

Yan Jiang, Ruihong Qiu, Zi Huang

Recently, reinforcement learning (RL) has been widely applied during post-training for diffusion large language models (dLLMs) to enhance reasoning with block-wise semi-autoregressive generation. Block size has therefore become a vital factor in dLLMs, since it determines the parallel decoding granularity and affects the rollout trajectories during RL optimisation, e.g., GRPO. Instead of investigating the effect of block size during inference on individual domains, this paper studies block size from a domain conflict perspective for dLLM RL post-training in multi-domain scenarios. The main contributions are: (1) a formulation of domain block size conflict in multi-domain RL for dLLMs, which will largely affect the post-training effectiveness for rollout-based RL methods; (2) a novel dataset, Block-R1-41K is constructed with a best-improved training block size for each sample, which also induces a Block Size Conflict Score to quantitatively measure the domain conflict; (3) a new benchmark, Block-R1, for flexible RL post-training for dLLMs in both single and cross domain; and (4) a simple yet powerful cross-domain post-training method with sample-level best-improved training block sizes. Extensive experiments on 13 distinct datasets, 7 latest RL algorithms, and various different dLLM backbones are covered in Block-R1. The benchmark is open-sourced at https://github.com/YanJiangJerry/Block-R1, with the dataset released at https://huggingface.co/datasets/dLLM-R1/Block-R1-41K.

43.0IRMay 25
RAG-Match: Retrieval-Augmented Knowledge Injection and Hierarchical Reasoning for Calibrated Semantic Relevance

Hengjun Jiang, Liansheng Sun, Yan Jiang et al.

Semantic relevance judgment for search is particularly challenging in knowledge-intensive scenarios, where accurate ranking requires not only semantic matching but also background grounding, multi-step reasoning, and well-calibrated decision boundaries. Existing relevance models mainly rely on direct label supervision or shallow semantic similarity, which limits their ability to handle implicit intent, factual equivalence, and fine-grained relevance distinctions. To address this issue, we propose \textsc{RAG-Match}, a three-stage framework that integrates knowledge-augmented pretraining, hierarchical reasoning alignment, and preference-based decision calibration for relevance modeling. The key idea is to first strengthen query-centered semantic grounding, then align the model with structured relevance reasoning, and finally correct decision-level inconsistencies in difficult boundary cases. Experimental results on a real-world search relevance benchmark show that \textsc{RAG-Match} consistently outperforms strong LLM-based baselines across multiple ranking metrics, demonstrating the effectiveness of combining knowledge injection, reasoning supervision, and preference optimization for fine-grained relevance judgment.

95.8LGMay 4Code
Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning

Yan Jiang, Ruihong Qiu, Zi Huang

Recent diffusion large language models (dLLMs) have demonstrated both effectiveness and efficiency in reasoning via a block-based semi-autoregressive generation paradigm. Despite their progress, the fixed-size block generations remain a critical bottleneck for effective and coherent reasoning. 1. From a global perspective, different reasoning tasks would correspond to different optimal decoding block sizes, which makes a ``one-size-fits-all'' assumption ineffective. 2. Even within a single reasoning task, the rigid block partitioning would break the logical flow and reduce reasoning coherence. Through empirical observations, we reveal that for block-wise entropy, incorrect reasoning exhibits a fluctuating and unsteady trend between blocks, whereas the correctly generated tasks follow a consistent descending trend. Therefore, this paper proposes b1, a novel post-training framework for dLLMs that learns dynamic-size reasoning blocks via a Monotonic Entropy Descent objective with reinforcement learning to enhance reasoning coherence.b1 integrates seamlessly as a plug-and-play module with existing dLLM's post-training algorithms. Extensive experiments across various reasoning benchmarks showcase b1's consistent improvement over existing fixed-size block baselines. Our code has been released at https://github.com/YanJiangJerry/Block-R1.

NASep 6, 2022
Weak Collocation Regression method: fast reveal hidden stochastic dynamics from high-dimensional aggregate data

Liwei Lu, Zhijun Zeng, Yan Jiang et al.

Revealing hidden dynamics from the stochastic data is a challenging problem as randomness takes part in the evolution of the data. The problem becomes exceedingly complex when the trajectories of the stochastic data are absent in many scenarios. Here we present an approach to effectively modeling the dynamics of the stochastic data without trajectories based on the weak form of the Fokker-Planck (FP) equation, which governs the evolution of the density function in the Brownian process. Taking the collocations of Gaussian functions as the test functions in the weak form of the FP equation, we transfer the derivatives to the Gaussian functions and thus approximate the weak form by the expectational sum of the data. With a dictionary representation of the unknown terms, a linear system is built and then solved by the regression, revealing the unknown dynamics of the data. Hence, we name the method with the Weak Collocation Regression (WCR) method for its three key components: weak form, collocation of Gaussian kernels, and regression. The numerical experiments show that our method is flexible and fast, which reveals the dynamics within seconds in multi-dimensional problems and can be easily extended to high-dimensional data such as 20 dimensions. WCR can also correctly identify the hidden dynamics of the complex tasks with variable-dependent diffusion and coupled drift, and the performance is robust, achieving high accuracy in the case with noise added.

2.5CLMay 21
A Comparative Evaluation of Structural Topic Models and BERTopic for Short, Open-Ended Survey Responses

Yan Jiang, Sihong Liu, Philip A. Fisher

Topic modeling in applied psychology increasingly spans two methodological traditions: probabilistic bag-of-words models and newer embedding-based approaches. Yet many evaluations of these methods rely on longer and cleaner benchmark corpora, leaving less guidance for short, open-ended survey responses. This paper compares Structural Topic Models (STM), a probabilistic topic model, and BERTopic, an embedding-based model, for analyzing open-ended survey responses. We evaluated three STM conditions and five BERTopic conditions, varying typographical correction, stemming, embedding choice, and contextual augmentation, a strategy we introduced to provide additional semantic context for very short responses. Results indicate that BERTopic consistently produced higher topic coherence than STM, with contextual augmentation yielding the strongest performance gains. In contrast, higher-dimensional embeddings alone did not improve coherence and were associated with greater data loss. Qualitative evaluation showed that BERTopic generated more interpretable and stable topics, while STM topics were often broader and more mixed. However, STM provides stronger support for inferential covariate analysis, whereas BERTopic covariate comparisons are primarily descriptive. These findings suggest that STM and BERTopic offer complementary strengths. We conclude with practical guidance for selecting and combining topic modeling approaches in applied social science research.

CLNov 9, 2025
Overview of CHIP 2025 Shared Task 2: Discharge Medication Recommendation for Metabolic Diseases Based on Chinese Electronic Health Records

Juntao Li, Haobin Yuan, Ling Luo et al.

Discharge medication recommendation plays a critical role in ensuring treatment continuity, preventing readmission, and improving long-term management for patients with chronic metabolic diseases. This paper present an overview of the CHIP 2025 Shared Task 2 competition, which aimed to develop state-of-the-art approaches for automatically recommending appro-priate discharge medications using real-world Chinese EHR data. For this task, we constructed CDrugRed, a high-quality dataset consisting of 5,894 de-identified hospitalization records from 3,190 patients in China. This task is challenging due to multi-label nature of medication recommendation, het-erogeneous clinical text, and patient-specific variability in treatment plans. A total of 526 teams registered, with 167 and 95 teams submitting valid results to the Phase A and Phase B leaderboards, respectively. The top-performing team achieved the highest overall performance on the final test set, with a Jaccard score of 0.5102, F1 score of 0.6267, demonstrating the potential of advanced large language model (LLM)-based ensemble systems. These re-sults highlight both the promise and remaining challenges of applying LLMs to medication recommendation in Chinese EHRs. The post-evaluation phase remains open at https://tianchi.aliyun.com/competition/entrance/532411/.

CLOct 24, 2025Code
CDrugRed: A Chinese Drug Recommendation Dataset for Discharge Medications in Metabolic Diseases

Juntao Li, Haobin Yuan, Ling Luo et al.

Intelligent drug recommendation based on Electronic Health Records (EHRs) is critical for improving for improving the quality and efficiency of clinical decision-making. By leveraging large-scale patient data, drug recommendation systems can assist physicians in selecting the most appropriate medications according to a patient's medical history, diagnoses, laboratory results, and comorbidities. However, the advancement of such systems is significantly hampered by the scarcity of publicly available, real-world EHR datasets, particularly in languages other than English. In this work, we present CDrugRed, a first publicly available Chinese drug recommendation dataset focused on discharge medications for metabolic diseases. The dataset includes 5,894 de-identified records from 3,190 patients, containing comprehensive information such as patient demographics, medical history, clinical course, and discharge diagnoses. We assess the utility of CDrugRed by benchmarking several state-of-the-art large language models (LLMs) on the discharge medication recommendation task. Experimental results show that while supervised fine-tuning improves model performance, there remains substantial room for improvement, with the best model achieving the F1 score of 0.5648 and Jaccard score of 0.4477. This result highlights the complexity of the clinical drug recommendation task and establishes CDrugRed as a challenging and valuable resource for developing more robust and accurate drug recommendation systems. The dataset is publicly available to the research community under the data usage agreements at https://github.com/DUTIR-BioNLP/CDrugRed.

CVApr 30, 2025Code
MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance

Mengting Wei, Yante Li, Tuomas Varanka et al.

In this study, we propose a method for video face reenactment that integrates a 3D face parametric model into a latent diffusion framework, aiming to improve shape consistency and motion control in existing video-based face generation approaches. Our approach employs the FLAME (Faces Learned with an Articulated Model and Expressions) model as the 3D face parametric representation, providing a unified framework for modeling face expressions and head pose. This not only enables precise extraction of motion features from driving videos, but also contributes to the faithful preservation of face shape and geometry. Specifically, we enhance the latent diffusion model with rich 3D expression and detailed pose information by incorporating depth maps, normal maps, and rendering maps derived from FLAME sequences. These maps serve as motion guidance and are encoded into the denoising UNet through a specifically designed Geometric Guidance Encoder (GGE). A multi-layer feature fusion module with integrated self-attention mechanisms is used to combine facial appearance and motion latent features within the spatial domain. By utilizing the 3D face parametric model as motion guidance, our method enables parametric alignment of face identity between the reference image and the motion captured from the driving video. Experimental results on benchmark datasets show that our method excels at generating high-quality face animations with precise expression and head pose variation modeling. In addition, it demonstrates strong generalization performance on out-of-domain images. Code is publicly available at https://github.com/weimengting/MagicPortrait.

CVMar 15, 2025Code
L2RW+: A Comprehensive Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification

Yan Jiang, Hao Yu, Mengting Wei et al.

Visible-infrared person re-identification (VI-ReID) is a challenging task that aims to match pedestrian images captured under varying lighting conditions, which has drawn intensive research attention and achieved promising results. However, existing methods adopt the centralized training, ignoring the potential privacy concerns as the data is distributed across multiple devices or entities in reality. In this paper, we propose L2RW+, a benchmark that brings VI-ReID closer to real-world applications. The core rationale behind L2RW+ is that incorporating decentralized training into VI-ReID can address privacy concerns in scenarios with limited data-sharing constrains. Specifically, we design protocols and corresponding algorithms for different privacy sensitivity levels. In our new benchmark, we simulate the training under real-world data conditions that: 1) data from each camera is completely isolated, or 2) different data entities (e.g., data controllers of a certain region) can selectively share the data. In this way, we simulate scenarios with strict privacy restrictions, which is closer to real-world conditions. Comprehensive experiments show the feasibility and potential of decentralized VI-ReID training at both image and video levels. In particular, with increasing data scales, the performance gap between decentralized and centralized training decreases, especially in video-level VI-ReID. In unseen domains, decentralized training even achieves performance comparable to SOTA centralized methods. This work offers a novel research entry for deploying VI-ReID into real-world scenarios and can benefit the community. Code is available at: https://github.com/Joey623/L2RW.

56.7AIMay 13
Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

Hao Zhou, Tiru Wu, Yan Jiang et al.

Multi-modal multi-agent systems (MM-MAS) have gained increasing attention for their capacity to enable complex reasoning and coordination across diverse modalities. As these systems continue to expand in scale and functionality, investigating their potential vulnerabilities has become increasingly important. However, existing studies on adversarial attacks in multi-agent systems primarily focus on isolated agents or unimodal settings, leaving the vulnerabilities of MM-MAS largely underexplored. To bridge this gap, we introduce HAM$^{3}$, a Hierarchical Attack framework for multi-modal multi-agent systems that decomposes attacks into three interconnected layers. Specifically, at the perception layer, HAM$^{3}$ mounts attacks by perturbing visual inputs, textual inputs, and their fused visual-textual representations. At the communication layer, it performs communication-level attacks that corrupt message content and interaction topology, such as manipulating shared context or communication links to distort collective information flow. At the reasoning layer, it conducts reasoning-level attacks that interfere with each agent's cognitive pipeline, biasing reasoning trajectories and ultimately compromising final decisions. We evaluate HAM$^{3}$ on the GQA benchmark through multi-agent systems built on distinct reasoning paradigms including ReAct, Plan-and-Solve, and Reflexion. Experiments demonstrate that our framework achieves an Attack Success Rate of up to 78.3%, with reasoning-layer attacks being the most effective. More than half of the successful attacks lead multiple agents to produce consistent errors. These findings offer valuable insights for building more robust and interpretable multi-agent intelligence.

LGOct 25, 2025Code
Does Homophily Help in Robust Test-time Node Classification?

Yan Jiang, Ruihong Qiu, Zi Huang

Homophily, the tendency of nodes from the same class to connect, is a fundamental property of real-world graphs, underpinning structural and semantic patterns in domains such as citation networks and social networks. Existing methods exploit homophily through designing homophily-aware GNN architectures or graph structure learning strategies, yet they primarily focus on GNN learning with training graphs. However, in real-world scenarios, test graphs often suffer from data quality issues and distribution shifts, such as domain shifts across users from different regions in social networks and temporal evolution shifts in citation network graphs collected over varying time periods. These factors significantly compromise the pre-trained model's robustness, resulting in degraded test-time performance. With empirical observations and theoretical analysis, we reveal that transforming the test graph structure by increasing homophily in homophilic graphs or decreasing it in heterophilic graphs can significantly improve the robustness and performance of pre-trained GNNs on node classifications, without requiring model training or update. Motivated by these insights, a novel test-time graph structural transformation method grounded in homophily, named GrapHoST, is proposed. Specifically, a homophily predictor is developed to discriminate test edges, facilitating adaptive test-time graph structural transformation by the confidence of predicted homophily scores. Extensive experiments on nine benchmark datasets under a range of test-time data quality issues demonstrate that GrapHoST consistently achieves state-of-the-art performance, with improvements of up to 10.92%. Our code has been released at https://github.com/YanJiangJerry/GrapHoST.

CVOct 23, 2025Code
Attentive Convolution: Unifying the Expressivity of Self-Attention with Convolutional Efficiency

Hao Yu, Haoyu Chen, Yan Jiang et al.

Self-attention (SA) has become the cornerstone of modern vision backbones for its powerful expressivity over traditional Convolutions (Conv). However, its quadratic complexity remains a critical bottleneck for practical applications. Given that Conv offers linear complexity and strong visual priors, continuing efforts have been made to promote the renaissance of Conv. However, a persistent performance chasm remains, highlighting that these modernizations have not yet captured the intrinsic expressivity that defines SA. In this paper, we re-examine the design of the CNNs, directed by a key question: what principles give SA its edge over Conv? As a result, we reveal two fundamental insights that challenge the long-standing design intuitions in prior research (e.g., Receptive field). The two findings are: (1) \textit{Adaptive routing}: SA dynamically regulates positional information flow according to semantic content, whereas Conv employs static kernels uniformly across all positions. (2) \textit{Lateral inhibition}: SA induces score competition among token weighting, effectively suppressing redundancy and sharpening representations, whereas Conv filters lack such inhibitory dynamics and exhibit considerable redundancy. Based on this, we propose \textit{Attentive Convolution} (ATConv), a principled reformulation of the convolutional operator that intrinsically injects these principles. Interestingly, with only $3\times3$ kernels, ATConv consistently outperforms various SA mechanisms in fundamental vision tasks. Building on ATConv, we introduce AttNet, a CNN family that can attain \textbf{84.4\%} ImageNet-1K Top-1 accuracy with only 27M parameters. In diffusion-based image generation, replacing all SA with the proposed $3\times 3$ ATConv in SiT-XL/2 reduces ImageNet FID by 0.15 in 400k steps with faster sampling. Code is available at: github.com/price112/Attentive-Convolution.

CLOct 10, 2025Code
A Unified Biomedical Named Entity Recognition Framework with Large Language Models

Tengxiao Lv, Ling Luo, Juntao Li et al.

Accurate recognition of biomedical named entities is critical for medical information extraction and knowledge discovery. However, existing methods often struggle with nested entities, entity boundary ambiguity, and cross-lingual generalization. In this paper, we propose a unified Biomedical Named Entity Recognition (BioNER) framework based on Large Language Models (LLMs). We first reformulate BioNER as a text generation task and design a symbolic tagging strategy to jointly handle both flat and nested entities with explicit boundary annotation. To enhance multilingual and multi-task generalization, we perform bilingual joint fine-tuning across multiple Chinese and English datasets. Additionally, we introduce a contrastive learning-based entity selector that filters incorrect or spurious predictions by leveraging boundary-sensitive positive and negative samples. Experimental results on four benchmark datasets and two unseen corpora show that our method achieves state-of-the-art performance and robust zero-shot generalization across languages. The source codes are freely available at https://github.com/dreamer-tx/LLMNER.

CVSep 10, 2025Code
Hyperspectral Mamba for Hyperspectral Object Tracking

Long Gao, Yunhe Zhang, Yan Jiang et al.

Hyperspectral object tracking holds great promise due to the rich spectral information and fine-grained material distinctions in hyperspectral images, which are beneficial in challenging scenarios. While existing hyperspectral trackers have made progress by either transforming hyperspectral data into false-color images or incorporating modality fusion strategies, they often fail to capture the intrinsic spectral information, temporal dependencies, and cross-depth interactions. To address these limitations, a new hyperspectral object tracking network equipped with Mamba (HyMamba), is proposed. It unifies spectral, cross-depth, and temporal modeling through state space modules (SSMs). The core of HyMamba lies in the Spectral State Integration (SSI) module, which enables progressive refinement and propagation of spectral features with cross-depth and temporal spectral information. Embedded within each SSI, the Hyperspectral Mamba (HSM) module is introduced to learn spatial and spectral information synchronously via three directional scanning SSMs. Based on SSI and HSM, HyMamba constructs joint features from false-color and hyperspectral inputs, and enhances them through interaction with original spectral features extracted from raw hyperspectral images. Extensive experiments conducted on seven benchmark datasets demonstrate that HyMamba achieves state-of-the-art performance. For instance, it achieves 73.0\% of the AUC score and 96.3\% of the DP@20 score on the HOTC2020 dataset. The code will be released at https://github.com/lgao001/HyMamba.

CVFeb 4, 2025Code
Extending SEEDS to a Supervoxel Algorithm for Medical Image Analysis

Chenhui Zhao, Yan Jiang, Todd C. Hollon

In this work, we extend the SEEDS superpixel algorithm from 2D images to 3D volumes, resulting in 3D SEEDS, a faster, better, and open-source supervoxel algorithm for medical image analysis. We compare 3D SEEDS with the widely used supervoxel algorithm SLIC on 13 segmentation tasks across 10 organs. 3D SEEDS accelerates supervoxel generation by a factor of 10, improves the achievable Dice score by +6.5%, and reduces the under-segmentation error by -0.16%. The code is available at https://github.com/Zch0414/3d_seeds

11.3CVApr 6
TaFall: Balance-Informed Fall Detection via Passive Thermal Sensing

Chengxiao Li, Xie Zhang, Wei Zhu et al.

Falls are a major cause of injury and mortality among older adults, yet most incidents occur in private indoor environments where monitoring must balance effectiveness with privacy. Existing privacy-preserving fall detection approaches, particularly those based on radio frequency sensing, often rely on coarse motion cues, which limits reliability in real-world deployments. We introduce TaFall, a balance-informed fall detection system based on low-cost, privacy-preserving thermal array sensing. The key insight is that TaFall models a fall as a process of balance degradation and detects falls by estimating pose-driven biomechanical balance dynamics. To enable this capability from low-resolution thermal array maps, we propose (i) an appearance-motion fusion model for robust pose reconstruction, (ii) physically grounded balance-aware learning, and (iii) pose-bridged pretraining to improve robustness. TaFall achieves a detection rate of 98.26% with a false alarm rate of 0.65% on our dataset with over 3,000 fall instances from 35 participants across diverse indoor environments. In 27 day deployments across four homes, TaFall attains an ultra-low false alarm rate of 0.00126% and a pilot bathroom study confirms robustness under moisture and thermal interference. Together, these results establish TaFall as a reliable and privacy-preserving approach to fall detection in everyday living environments.

92.4NAMar 29
A limiter-based approach to construct high-order fully-discrete entropy stable explicit DG schemes for hyperbolic conservation laws

Yuchang Liu, Wei Guo, Yan Jiang et al.

This paper presents a class of novel high-order fully-discrete entropy stable (ES) discontinuous Galerkin (DG) schemes with explicit time discretization. The proposed methodology exploits a critical observation from [4] that the cell averages of classical DG solutions with forward Euler time stepping satisfy an ``entropy-stable-like'' property. Building on this result, fully-discrete entropy stability is rigorously enforced through a simple Zhang--Shu-type scaling limiter [45] applied as a post-processing step, without modifying the underlying spatial discretization. Furthermore, the proposed methodology can simultaneously enforce multiple cell entropy inequalities, a capability unavailable in existing ES DG schemes. High-order accuracy in time is achieved by using strong-stability-preserving (SSP) multistep methods. Theoretically, we prove that the proposed scheme indeed maintains high-order accuracy and establish a Lax--Wendroff-type theorem guaranteeing that the limit of the numerical solutions, if it exists, satisfies the desired entropy inequality. Extensive numerical tests for scalar equations and systems, including the nonconvex Buckley--Leverett problem and extreme examples of Euler equations, demonstrate optimal accuracy, enforcement of multiple entropy conditions, and strong robustness.

LGMay 29, 2025
OTPTO: Joint Product Selection and Inventory Optimization in Fresh E-commerce Front-End Warehouses

Zheming Zhang, Yan Jiang, Qingshan Li et al.

In China's competitive fresh e-commerce market, optimizing operational strategies, especially inventory management in front-end warehouses, is key to enhance customer satisfaction and to gain a competitive edge. Front-end warehouses are placed in residential areas to ensure the timely delivery of fresh goods and are usually in small size. This brings the challenge of deciding which goods to stock and in what quantities, taking into account capacity constraints. To address this issue, traditional predict-then-optimize (PTO) methods that predict sales and then decide on inventory often don't align prediction with inventory goals, as well as fail to prioritize consumer satisfaction. This paper proposes a multi-task Optimize-then-Predict-then-Optimize (OTPTO) approach that jointly optimizes product selection and inventory management, aiming to increase consumer satisfaction by maximizing the full order fulfillment rate. Our method employs a 0-1 mixed integer programming model OM1 to determine historically optimal inventory levels, and then uses a product selection model PM1 and the stocking model PM2 for prediction. The combined results are further refined through a post-processing algorithm OM2. Experimental results from JD.com's 7Fresh platform demonstrate the robustness and significant advantages of our OTPTO method. Compared to the PTO approach, our OTPTO method substantially enhances the full order fulfillment rate by 4.34% (a relative increase of 7.05%) and narrows the gap to the optimal full order fulfillment rate by 5.27%. These findings substantiate the efficacy of the OTPTO method in managing inventory at front-end warehouses of fresh e-commerce platforms and provide valuable insights for future research in this domain.

LGOct 29, 2025
Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training

Hong Wang, Haiyang Xin, Jie Wang et al.

Pre-training has proven effective in addressing data scarcity and performance limitations in solving PDE problems with neural operators. However, challenges remain due to the heterogeneity of PDE datasets in equation types, which leads to high errors in mixed training. Additionally, dense pre-training models that scale parameters by increasing network width or depth incur significant inference costs. To tackle these challenges, we propose a novel Mixture-of-Experts Pre-training Operator Transformer (MoE-POT), a sparse-activated architecture that scales parameters efficiently while controlling inference costs. Specifically, our model adopts a layer-wise router-gating network to dynamically select 4 routed experts from 16 expert networks during inference, enabling the model to focus on equation-specific features. Meanwhile, we also integrate 2 shared experts, aiming to capture common properties of PDE and reduce redundancy among routed experts. The final output is computed as the weighted average of the results from all activated experts. We pre-train models with parameters from 30M to 0.5B on 6 public PDE datasets. Our model with 90M activated parameters achieves up to a 40% reduction in zero-shot error compared with existing models with 120M activated parameters. Additionally, we conduct interpretability analysis, showing that dataset types can be inferred from router-gating network decisions, which validates the rationality and effectiveness of the MoE architecture.

AISep 28, 2025
Game-Oriented ASR Error Correction via RAG-Enhanced LLM

Yan Jiang, Yongle Luo, Qixian Zhou et al.

With the rise of multiplayer online games, real-time voice communication is essential for team coordination. However, general ASR systems struggle with gaming-specific challenges like short phrases, rapid speech, jargon, and noise, leading to frequent errors. To address this, we propose the GO-AEC framework, which integrates large language models, Retrieval-Augmented Generation (RAG), and a data augmentation strategy using LLMs and TTS. GO-AEC includes data augmentation, N-best hypothesis-based correction, and a dynamic game knowledge base. Experiments show GO-AEC reduces character error rate by 6.22% and sentence error rate by 29.71%, significantly improving ASR accuracy in gaming scenarios.

CVAug 30, 2025
Multi-Focused Video Group Activities Hashing

Zhongmiao Qi, Yan Jiang, Bolin Zhang et al.

With the explosive growth of video data in various complex scenarios, quickly retrieving group activities has become an urgent problem. However, many tasks can only retrieve videos focusing on an entire video, not the activity granularity. To solve this problem, we propose a new STVH (spatiotemporal interleaved video hashing) technique for the first time. Through a unified framework, the STVH simultaneously models individual object dynamics and group interactions, capturing the spatiotemporal evolution on both group visual features and positional features. Moreover, in real-life video retrieval scenarios, it may sometimes require activity features, while at other times, it may require visual features of objects. We then further propose a novel M-STVH (multi-focused spatiotemporal video hashing) as an enhanced version to handle this difficult task. The advanced method incorporates hierarchical feature integration through multi-focused representation learning, allowing the model to jointly focus on activity semantics features and object visual features. We conducted comparative experiments on publicly available datasets, and both STVH and M-STVH can achieve excellent results.

AIJun 24, 2025
NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration

Yan Jiang, Hao Zhou, LiZhong GU et al.

Large language models (LLMs) have recently demonstrated the ability to act as function call agents by invoking external tools, enabling them to solve tasks beyond their static knowledge. However, existing agents typically call tools step by step at a time without a global view of task structure. As tools depend on each other, this leads to error accumulation and limited scalability, particularly when scaling to thousands of tools. To address these limitations, we propose NaviAgent, a novel bilevel architecture that decouples task planning from tool execution through graph-based modeling of the tool ecosystem. At the task-planning level, the LLM-based agent decides whether to respond directly, clarify user intent, invoke a toolchain, or execute tool outputs, ensuring broad coverage of interaction scenarios independent of inter-tool complexity. At the execution level, a continuously evolving Tool World Navigation Model (TWNM) encodes structural and behavioral relations among tools, guiding the agent to generate scalable and robust invocation sequences. By incorporating feedback from real tool interactions, NaviAgent supports closed-loop optimization of planning and execution, moving beyond tool calling toward adaptive navigation of large-scale tool ecosystems. Experiments show that NaviAgent achieves the best task success rates across models and tasks, and integrating TWMN further boosts performance by up to 17 points on complex tasks, underscoring its key role in toolchain orchestration.

CVMar 28, 2025
Hyperspectral Adapter for Object Tracking based on Hyperspectral Video

Long Gao, Yunhe Zhang, Langkun Chen et al.

Object tracking based on hyperspectral video attracts increasing attention to the rich material and motion information in the hyperspectral videos. The prevailing hyperspectral methods adapt pretrained RGB-based object tracking networks for hyperspectral tasks by fine-tuning the entire network on hyperspectral datasets, which achieves impressive results in challenging scenarios. However, the performance of hyperspectral trackers is limited by the loss of spectral information during the transformation, and fine-tuning the entire pretrained network is inefficient for practical applications. To address the issues, a new hyperspectral object tracking method, hyperspectral adapter for tracking (HyA-T), is proposed in this work. The hyperspectral adapter for the self-attention (HAS) and the hyperspectral adapter for the multilayer perceptron (HAM) are proposed to generate the adaption information and to transfer the multi-head self-attention (MSA) module and the multilayer perceptron (MLP) in pretrained network for the hyperspectral object tracking task by augmenting the adaption information into the calculation of the MSA and MLP. Additionally, the hyperspectral enhancement of input (HEI) is proposed to augment the original spectral information into the input of the tracking network. The proposed methods extract spectral information directly from the hyperspectral images, which prevent the loss of the spectral information. Moreover, only the parameters in the proposed methods are fine-tuned, which is more efficient than the existing methods. Extensive experiments were conducted on four datasets with various spectral bands, verifing the effectiveness of the proposed methods. The HyA-T achieves state-of-the-art performance on all the datasets.

CVMar 10, 2025
Zero-Shot Hashing Based on Reconstruction With Part Alignment

Yan Jiang, Zhongmiao Qi, Jianhao Li et al.

Hashing algorithms have been widely used in large-scale image retrieval tasks, especially for seen class data. Zero-shot hashing algorithms have been proposed to handle unseen class data. The key technique in these algorithms involves learning features from seen classes and transferring them to unseen classes, that is, aligning the feature embeddings between the seen and unseen classes. Most existing zero-shot hashing algorithms use the shared attributes between the two classes of interest to complete alignment tasks. However, the attributes are always described for a whole image, even though they represent specific parts of the image. Hence, these methods ignore the importance of aligning attributes with the corresponding image parts, which explicitly introduces noise and reduces the accuracy achieved when aligning the features of seen and unseen classes. To address this problem, we propose a new zero-shot hashing method called RAZH. We first use a clustering algorithm to group similar patches to image parts for attribute matching and then replace the image parts with the corresponding attribute vectors, gradually aligning each part with its nearest attribute. Extensive evaluation results demonstrate the superiority of the RAZH method over several state-of-the-art methods.

LGFeb 24, 2025
Predictive Response Optimization: Using Reinforcement Learning to Fight Online Social Network Abuse

Garrett Wilson, Geoffrey Goh, Yan Jiang et al.

Detecting phishing, spam, fake accounts, data scraping, and other malicious activity in online social networks (OSNs) is a problem that has been studied for well over a decade, with a number of important results. Nearly all existing works on abuse detection have as their goal producing the best possible binary classifier; i.e., one that labels unseen examples as "benign" or "malicious" with high precision and recall. However, no prior published work considers what comes next: what does the service actually do after it detects abuse? In this paper, we argue that detection as described in previous work is not the goal of those who are fighting OSN abuse. Rather, we believe the goal to be selecting actions (e.g., ban the user, block the request, show a CAPTCHA, or "collect more evidence") that optimize a tradeoff between harm caused by abuse and impact on benign users. With this framing, we see that enlarging the set of possible actions allows us to move the Pareto frontier in a way that is unattainable by simply tuning the threshold of a binary classifier. To demonstrate the potential of our approach, we present Predictive Response Optimization (PRO), a system based on reinforcement learning that utilizes available contextual information to predict future abuse and user-experience metrics conditioned on each possible action, and select actions that optimize a multi-dimensional tradeoff between abuse/harm and impact on user experience. We deployed versions of PRO targeted at stopping automated activity on Instagram and Facebook. In both cases our experiments showed that PRO outperforms a baseline classification system, reducing abuse volume by 59% and 4.5% (respectively) with no negative impact to users. We also present several case studies that demonstrate how PRO can quickly and automatically adapt to changes in business constraints, system behavior, and/or adversarial tactics.

AIApr 8, 2024
Pricing Strategies for Different Accuracy Models from the Same Dataset Based on Generalized Hotelling's Law

Jie Liu, Tao Feng, Yan Jiang et al.

We consider a scenario where a seller possesses a dataset $D$ and trains it into models of varying accuracies for sale in the market. Due to the reproducibility of data, the dataset can be reused to train models with different accuracies, and the training cost is independent of the sales volume. These two characteristics lead to fundamental differences between the data trading market and traditional trading markets. The introduction of different models into the market inevitably gives rise to competition. However, due to the varying accuracies of these models, traditional multi-oligopoly games are not applicable. We consider a generalized Hotelling's law, where the accuracy of the models is abstracted as distance. Buyers choose to purchase models based on a trade-off between accuracy and price, while sellers determine their pricing strategies based on the market's demand. We present two pricing strategies: static pricing strategy and dynamic pricing strategy, and we focus on the static pricing strategy. We propose static pricing mechanisms based on various market conditions and provide an example. Finally, we demonstrate that our pricing strategy remains robust in the context of incomplete information games.

LGDec 31, 2020
Modified Gaussian Process Regression Models for Cyclic Capacity Prediction of Lithium-ion Batteries

Kailong Liu, Xiaosong Hu, Zhongbao Wei et al.

This paper presents the development of machine learning-enabled data-driven models for effective capacity predictions for lithium-ion batteries under different cyclic conditions. To achieve this, a model structure is first proposed with the considerations of battery ageing tendency and the corresponding operational temperature and depth-of-discharge. Then based on a systematic understanding of covariance functions within the Gaussian process regression, two related data-driven models are developed. Specifically, by modifying the isotropic squared exponential kernel with an automatic relevance determination structure, 'Model A' could extract the highly relevant input features for capacity predictions. Through coupling the Arrhenius law and a polynomial equation into a compositional kernel, 'Model B' is capable of considering the electrochemical and empirical knowledge of battery degradation. The developed models are validated and compared on the Nickel Manganese Cobalt Oxide (NMC) lithium-ion batteries with various cycling patterns. Experimental results demonstrate that the modified Gaussian process regression model considering the battery electrochemical and empirical ageing signature outperforms other counterparts and is able to achieve satisfactory results for both one-step and multi-step predictions. The proposed technique is promising for battery capacity predictions under various cycling cases.

NAOct 3, 2018
Dispersion Analysis of Finite Difference and Discontinuous Galerkin Schemes for Maxwell's Equations in Linear Lorentz Media

Yan Jiang, Puttha Sakkaplangkul, Vrushali A. Bokil et al.

In this paper, we consider Maxwell's equations in linear dispersive media described by a single-pole Lorentz model for electronic polarization. We study two classes of commonly used spatial discretizations: finite difference methods (FD) with arbitrary even order accuracy in space and high spatial order discontinuous Galerkin (DG) finite element methods. Both types of spatial discretizations are coupled with second order semi-implicit leap-frog and implicit trapezoidal temporal schemes studied in our previous research [5,6]. By performing detailed dispersion analysis for the semi-discrete and fully discrete schemes, we obtain rigorous quantification of the dispersion error for Lorentz dispersive dielectrics. In particular, comparisons of dispersion error can be made taking into account the model parameters, and mesh sizes in the design of the two types of schemes. The results for the numerical dispersion analysis can guide us in the optimal choice of discretization parameters for the more complicated and nonlinear models. The numerical dispersion analysis of the fully discrete FD and DG schemes, for the dispersive Maxwell model considered in this paper, clearly indicate the dependence of the numerical dispersion errors on spatial and temporal discretizations, their order of accuracy, mesh discretization parameters and model parameters. The results obtained here cannot be arrived at by considering discretizations of Maxwell's equations in free space. In particular, our results contrast the advantages and disadvantages of using high order FD or DG schemes and leap-frog or trapezoidal time integrators over different frequency ranges using a variety of measures of numerical dispersion errors. Finally, we highlight the limitations of the second order accurate temporal discretizations considered.

MLJul 5, 2016
Algorithms for Generalized Cluster-wise Linear Regression

Young Woong Park, Yan Jiang, Diego Klabjan et al.

Cluster-wise linear regression (CLR), a clustering problem intertwined with regression, is to find clusters of entities such that the overall sum of squared errors from regressions performed over these clusters is minimized, where each cluster may have different variances. We generalize the CLR problem by allowing each entity to have more than one observation, and refer to it as generalized CLR. We propose an exact mathematical programming based approach relying on column generation, a column generation based heuristic algorithm that clusters predefined groups of entities, a metaheuristic genetic algorithm with adapted Lloyd's algorithm for K-means clustering, a two-stage approach, and a modified algorithm of Sp{ä}th \cite{Spath1979} for solving generalized CLR. We examine the performance of our algorithms on a stock keeping unit (SKU) clustering problem employed in forecasting halo and cannibalization effects in promotions using real-world retail data from a large supermarket chain. In the SKU clustering problem, the retailer needs to cluster SKUs based on their seasonal effects in response to promotions. The seasonal effects are the results of regressions with predictors being promotion mechanisms and seasonal dummies performed over clusters generated. We compare the performance of all proposed algorithms for the SKU problem with real-world and synthetic data.