CVMar 24

Learning Cross-Joint Attention for Generalizable Video-Based Seizure Detection

arXiv:2603.2375725.8h-index: 9
AI Analysis

This work addresses the challenge of generalizable seizure detection for clinical monitoring, representing an incremental improvement by enhancing cross-subject performance through attention to body dynamics.

The paper tackled the problem of automated seizure detection from clinical videos by addressing poor generalization to unseen subjects due to background bias, proposing a joint-centric attention model that focuses on body dynamics to capture seizure patterns, resulting in consistent outperformance over state-of-the-art methods in cross-subject experiments.

Automated seizure detection from long-term clinical videos can substantially reduce manual review time and enable real-time monitoring. However, existing video-based methods often struggle to generalize to unseen subjects due to background bias and reliance on subject-specific appearance cues. We propose a joint-centric attention model that focuses exclusively on body dynamics to improve cross-subject generalization. For each video segment, body joints are detected and joint-centered clips are extracted, suppressing background context. These joint-centered clips are tokenized using a Video Vision Transformer (ViViT), and cross-joint attention is learned to model spatial and temporal interactions between body parts, capturing coordinated movement patterns characteristic of seizure semiology. Extensive cross-subject experiments show that the proposed method consistently outperforms state-of-the-art CNN-, graph-, and transformer-based approaches on unseen subjects.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes