49.7LGMay 28
MMTM: Tri-Modal Topic Modeling for Long-Form Video via Similarity-Gated FusionAli Abusaleh, Bhuvanesh Verma, Alexander Mehler
We introduce MMTM, a modular pipeline for topic discovery in long-form video that integrates speech recognition, audio and visual embeddings, and BERTopic clustering through a deterministic similarity-gated fusion. Evaluated cross-lingually on German (Tagesschau) and English (NBC) broadcast news, joint tri-modal modeling substantially improves topic quality: noise drops from 0.27 to 0.06, transition rate from 0.70 to 0.21, and normalized entropy rises from 0.84 to 0.92, indicating more coherent and temporally stable topics. Cluster validity (Calinski-Harabasz) improves by 5-12X across embedding spaces. Lexical coherence (NPMI) rises from 0.77 to 0.86 on German but is corpus-dependent and does not transfer to the shorter NBC broadcasts. We release the pipeline code and a human-validated 54-hour multimodal video topic corpus with dual-annotator visual evaluation and LLM-assisted labeling.
SPSep 14, 2024
A Multitask VAE for Time Series Preprocessing and Prediction of Blood Glucose LevelAli AbuSaleh, Mehdi Rahim
Data preprocessing is a critical part of time series data analysis. Data from connected medical devices often have missing or abnormal values during acquisition. Handling such situations requires additional assumptions and domain knowledge. This can be time-consuming, and can introduce a significant bias affecting predictive model accuracy and thus, medical interpretation. To overcome this issue, we propose a new deep learning model to mitigate the preprocessing assumptions. The model architecture relies on a variational auto-encoder (VAE) to produce a preprocessing latent space, and a recurrent VAE to preserve the temporal dynamics of the data. We demonstrate the effectiveness of such an architecture on telemonitoring data to forecast glucose-level of diabetic patients. Our results show an improvement in terms of accuracy with respect of existing state-of-the-art methods and architectures.
CLDec 12, 2025
Extending a Parliamentary Corpus with MPs' Tweets: Automatic Annotation and Evaluation Using MultiParTweetMevlüt Bagci, Ali Abusaleh, Daniel Baumartz et al.
Social media serves as a critical medium in modern politics because it both reflects politicians' ideologies and facilitates communication with younger generations. We present MultiParTweet, a multilingual tweet corpus from X that connects politicians' social media discourse with German political corpus GerParCor, thereby enabling comparative analyses between online communication and parliamentary debates. MultiParTweet contains 39 546 tweets, including 19 056 media items. Furthermore, we enriched the annotation with nine text-based models and one vision-language model (VLM) to annotate MultiParTweet with emotion, sentiment, and topic annotations. Moreover, the automated annotations are evaluated against a manually annotated subset. MultiParTweet can be reconstructed using our tool, TTLABTweetCrawler, which provides a framework for collecting data from X. To demonstrate a methodological demonstration, we examine whether the models can predict each other using the outputs of the remaining models. In summary, we provide MultiParTweet, a resource integrating automatic text and media-based annotations validated with human annotations, and TTLABTweetCrawler, a general-purpose X data collection tool. Our analysis shows that the models are mutually predictable. In addition, VLM-based annotation were preferred by human annotators, suggesting that multimodal representations align more with human interpretation.