9.4LGApr 16, 2025
Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language ModelsYuanbo Tang, Yan Tang, Naifan Zhang et al.
Mixture-of-Experts based large language models (MoE LLMs) have shown significant promise in multitask adaptability by dynamically routing inputs to specialized experts. Despite their success, the collaborative mechanisms among experts are still not well understood, limiting both the interpretability and optimization of these models. In this paper, we focus on two critical issues: (1) identifying expert collaboration patterns, and (2) optimizing MoE LLMs through expert pruning. To address the first issue, we propose a hierarchical sparse dictionary learning (HSDL) method that uncovers the collaboration patterns among experts. For the second issue, we introduce the Contribution-Aware Expert Pruning (CAEP) algorithm, which effectively prunes low-contribution experts. Our extensive experiments demonstrate that expert collaboration patterns are closely linked to specific input types and exhibit semantic significance across various tasks. Moreover, pruning experiments show that our approach improves overall performance by 2.5\% on average, outperforming existing methods. These findings offer valuable insights into enhancing the efficiency and interpretability of MoE LLMs, offering a clearer understanding of expert interactions and improving model optimization.
4.0IRDec 22, 2017
DancingLines: An Analytical Scheme to Depict Cross-Platform Event PopularityTianxiang Gao, Weiming Bao, Jinning Li et al.
Nowadays, events usually burst and are propagated online through multiple modern media like social networks and search engines. There exists various research discussing the event dissemination trends on individual medium, while few studies focus on event popularity analysis from a cross-platform perspective. Challenges come from the vast diversity of events and media, limited access to aligned datasets across different media and a great deal of noise in the datasets. In this paper, we design DancingLines, an innovative scheme that captures and quantitatively analyzes event popularity between pairwise text media. It contains two models: TF-SW, a semantic-aware popularity quantification model, based on an integrated weight coefficient leveraging Word2Vec and TextRank; and wDTW-CD, a pairwise event popularity time series alignment model matching different event phases adapted from Dynamic Time Warping. We also propose three metrics to interpret event popularity trends between pairwise social platforms. Experimental results on eighteen real-world event datasets from an influential social network and a popular search engine validate the effectiveness and applicability of our scheme. DancingLines is demonstrated to possess broad application potentials for discovering the knowledge of various aspects related to events and different media.