63.1LGMay 19Code
An Exterior Method for Nonnegative Matrix FactorizationQiujing Lu, Tonmoy Monsoor, Ehsan Ebrahimzadeh et al.
Nonnegative matrix factorization (NMF) seeks a low-rank approximation $X \approx UV^T$ with nonnegative factors and is commonly solved using interior methods that enforce feasibility throughout optimization. We show that such constraint-driven approaches can impede progress in the nonconvex landscape, leading to slow convergence or convergence to suboptimal stationary points. We propose an exterior framework for NMF (eNMF) that separates low-rank approximation from nonnegativity enforcement. Our method initializes from the optimal unconstrained factorization and introduces a rotation procedure that maps unconstrained factors to an exterior point closest to the nonnegative orthant. This viewpoint yields an algorithmic framework in which simple iterative updates converge to KKT-satisfying stationary points on the boundary of the positive orthant. The exterior formulation also enables a geometric interpretation of NMF solutions, clarifying equivalence classes of factorizations under permutation and orthogonal transformations. An intriguing numerical result, involving 400 NMF experiments across both real and synthetic datasets, show that in 99% of the cases, different algorithms tend to converge towards equivalent factor matrices. We benchmark eNMF against 9 state-of-the-art NMF algorithms with 9 initialization schemes across 3 real-world and 2 synthetic datasets. eNMF consistently outperforms all 81 competitors, achieving up to 30% lower reconstruction error under equal-time settings and up to 150% speedup under equal-error settings. The downstream experiments further demonstrate substantial performance gains in audio processing and recommendation tasks, corroborating the practical benefits of the proposed exterior optimization framework. Code is available at https://github.com/roychowdhuryresearch/eNMF
72.0CVMay 5Code
Can Multimodal Large Language Models Understand Pathologic Movements? A Pilot Study on Seizure SemiologyLina Zhang, Tonmoy Monsoor, Mehmet Efe Lorasdagi et al.
Multimodal Large Language Models (MLLMs) have demonstrated robust capabilities in recognizing everyday human activities, yet their potential for analyzing clinically significant involuntary movements in neurological disorders remains largely unexplored. This pilot study evaluates the capability of MLLMs for automated recognition of pathological movements in seizure videos. We assessed the zero-shot performance of state-of-the-art MLLMs on 20 ILAE-defined semiological features across 90 clinical seizure recordings. MLLMs outperformed fine-tuned Convolutional Neural Network (CNN) and Vision Transformer (ViT) baseline models on 13 of 18 features without task-specific training, demonstrating particular strength in recognizing salient postural and contextual features while struggling with subtle, high-frequency movements. Feature-targeted signal enhancement (facial cropping, pose estimation, audio denoising) improved performance on 10 of 20 features. Expert evaluation showed that 94.3 percent of MLLM-generated explanations for correctly predicted cases achieved at least 60 percent faithfulness scores, aligning with epileptologist reasoning. These findings demonstrate the potential of adapting general-purpose MLLMs for specialized clinical video analysis through targeted preprocessing strategies, offering a path toward interpretable, efficient diagnostic assistance. Our code is publicly available at https://github.com/LinaZhangUCLA/PathMotionMLLM.
54.6CVMay 21
Seizure-Semiology-Suite (S3): A Clinically Multimodal Dataset, Benchmark, and Models for Seizure Semiology UnderstandingLina Zhang, Tonmoy Monsoor, Peizheng Li et al.
While Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in general video understanding, their capacity to interpret involuntary, and spatio-temporally evolving pathologic motor behaviors such as seizure semiology remains largely untested. To address this gap, we introduce Seizure-Semiology-Suite, a clinically grounded dataset and benchmark for fine-grained, structured seizure semiology understanding. The dataset includes 438 seizure videos annotated with over 35,000 dense labels covering 20 ILAE-defined semiological features. Building on this dataset, we propose a seven-task hierarchical benchmark that systematically evaluates MLLMs from low-level visual perception to temporal sequencing, narrative report generation, and seizure diagnosis. To enable clinically meaningful evaluation of generated reports, we further introduce the Report Quality Index for Seizure Semiology (Seizure-RQI). Extensive baselines across 11 open-weight MLLMs reveal systematic weaknesses in laterality reasoning, temporal localization, symptom sequencing, and clinically faithful reporting. We show that seizure-specific fine-tuning substantially improves performance across tasks, and that a two-stage neuro-symbolic framework achieves an F1 score of 0.96 on epileptic versus non-epileptic seizure classification. Seizure-Semiology-Suite establishes a rigorous benchmark for evaluating multimodal models in safety-critical medical video understanding and guides the development of clinically reliable, domain-adaptive multimodal intelligence.