Haoyan Zhang

h-index4

3papers

Novelty52%

AI Score40

Ranked #97,294 of 205,806 authors (top 47%)#32,032 in CV (top 54%)

3 Papers

46.0DSMar 17

A Jacobi Field Approach to Splitting Detection in SchrÃ¶dinger Bridge

Chunhai Jiao, Jin Guo, Haoyan Zhang et al.

We study the problem of detecting the onset of path splitting in stochastic interpolation between probability distributions. This question is especially subtle when the target distribution is nonconvex or supported on disconnected components, where interpolating trajectories may separate into distinct branches. Motivated by the stochastic control and SchrÃ¶dinger bridge viewpoint, we propose a Jacobi field based indicator for identifying candidate splitting times and locations. Our approach is based on the Jacobi field associated with the linearization of an induced interpolating flow. Starting from a stochastic interpolation ansatz, we construct an Eulerian velocity field by conditional averaging and derive its spatial Jacobian in terms of the local posterior geometry of the target sample cloud. This allows us to interpret the symmetric part of the Jacobian as a local strain tensor and to use its spectral structure to quantify the amplification of infinitesimal perturbations along reference trajectories. Numerical experiments on non-convex and disconnected target distributions show that the proposed indicator consistently localizes the emergence of branching regions and captures the temporal development of splitting. These results suggest that Jacobi field analysis provides a natural mathematical framework for studying local instability and splitting phenomena in stochastic interpolation.

47.5LGApr 10

Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference

Yueyuan Sui, Payal Mohapatra, Doğaç Eldenk et al.

Edge devices increasingly run multimodal sensing pipelines that must remain accurate despite fluctuating power budgets and unpredictable sensor dropout. Existing pruning methods fail under these conditions: they generally require fine-tuning after compression, consuming over $10\times$ the deployment energy, and they assign static importance scores that are blind to which sensors are present. We present the SentryFuse framework, which addresses both challenges jointly through two key components. First, SentryGate learns modality-conditioned importance scores during training via first-order saliency supervision and then prunes attention heads and feed-forward channels at deployment without fine-tuning. Second, SentryAttend replaces dense self-attention, a key bottleneck in contemporary multimodal architectures, with sparse grouped-query attention, yielding a net 15% reduction in GFLOPs across three different multimodal architectures. Across three applications and multimodal backbones, SentryGate achieves a 12.7% average accuracy improvement over the strongest pruning baseline, and upto to 18% under modality dropout conditions. Together, SentryFuse reduces memory by 28.2% and lowers latency by up to $1.63\times$ without further fine-tuning, establishing modality-aware zero-shot compression as a practical path to multimodal intelligence on heterogeneous edge hardware.

CVDec 10, 2025

Dynamic Facial Expressions Analysis Based Parkinson's Disease Auxiliary Diagnosis

Xiaochen Huang, Xiaochen Bi, Cuihua Lv et al.

Parkinson's disease (PD), a prevalent neurodegenerative disorder, significantly affects patients' daily functioning and social interactions. To facilitate a more efficient and accessible diagnostic approach for PD, we propose a dynamic facial expression analysis-based PD auxiliary diagnosis method. This method targets hypomimia, a characteristic clinical symptom of PD, by analyzing two manifestations: reduced facial expressivity and facial rigidity, thereby facilitating the diagnosis process. We develop a multimodal facial expression analysis network to extract expression intensity features during patients' performance of various facial expressions. This network leverages the CLIP architecture to integrate visual and textual features while preserving the temporal dynamics of facial expressions. Subsequently, the expression intensity features are processed and input into an LSTM-based classification network for PD diagnosis. Our method achieves an accuracy of 93.1%, outperforming other in-vitro PD diagnostic approaches. This technique offers a more convenient detection method for potential PD patients, improving their diagnostic experience.