MMLGJul 1, 2025

HyperFusion: Hierarchical Multimodal Ensemble Learning for Social Media Popularity Prediction

arXiv:2507.00926v12 citationsh-index: 12Has Code
Originality Incremental advance
AI Analysis

This work addresses content optimization and marketing strategies for digital platforms, but it is incremental as it builds on existing multimodal and ensemble methods.

The paper tackles social media popularity prediction by proposing HyperFusion, a hierarchical multimodal ensemble learning framework that integrates visual, textual, temporal, and user features, achieving third place in the SMP Challenge 2025 (Image Track).

Social media popularity prediction plays a crucial role in content optimization, marketing strategies, and user engagement enhancement across digital platforms. However, predicting post popularity remains challenging due to the complex interplay between visual, textual, temporal, and user behavioral factors. This paper presents HyperFusion, a hierarchical multimodal ensemble learning framework for social media popularity prediction. Our approach employs a three-tier fusion architecture that progressively integrates features across abstraction levels: visual representations from CLIP encoders, textual embeddings from transformer models, and temporal-spatial metadata with user characteristics. The framework implements a hierarchical ensemble strategy combining CatBoost, TabNet, and custom multi-layer perceptrons. To address limited labeled data, we propose a two-stage training methodology with pseudo-labeling and iterative refinement. We introduce novel cross-modal similarity measures and hierarchical clustering features that capture inter-modal dependencies. Experimental results demonstrate that HyperFusion achieves competitive performance on the SMP challenge dataset. Our team achieved third place in the SMP Challenge 2025 (Image Track). The source code is available at https://anonymous.4open.science/r/SMPDImage.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes