Synergistic Feature Fusion for Latent Lyrical Classification: A Gated Deep Learning Architecture
This addresses the challenge of multimodal lyrical analysis for researchers and practitioners, though it appears incremental as it builds on existing feature fusion methods.
This study tackled the problem of integrating complex deep semantic features with simple structural cues for lyrical content classification by introducing a Synergistic Fusion Layer (SFL) architecture, which achieved an accuracy of 0.9894 and a Macro F1 score of 0.9894 while reducing Expected Calibration Error by 93% compared to a baseline.
This study addresses the challenge of integrating complex, high-dimensional deep semantic features with simple, interpretable structural cues for lyrical content classification. We introduce a novel Synergistic Fusion Layer (SFL) architecture, a deep learning model utilizing a gated mechanism to modulate Sentence-BERT embeddings (Fdeep) using low-dimensional auxiliary features (Fstruct). The task, derived from clustering UMAP-reduced lyrical embeddings, is reframed as binary classification, distinguishing a dominant, homogeneous cluster (Class 0) from all other content (Class 1). The SFL model achieved an accuracy of 0.9894 and a Macro F1 score of 0.9894, outperforming a comprehensive Random Forest (RF) baseline that used feature concatenation (Accuracy = 0.9868). Crucially, the SFL model demonstrated vastly superior reliability and calibration, exhibiting a 93% reduction in Expected Calibration Error (ECE = 0.0035) and a 2.5x lower Log Loss (0.0304) compared to the RF baseline (ECE = 0.0500; Log Loss = 0.0772). This performance validates the architectural hypothesis that non-linear gating is superior to simple feature concatenation, establishing the SFL model as a robust and trustworthy system for complex multimodal lyrical analysis.