Understanding LLM Behavior in Multi-Target Cross-Lingual Summarization
For researchers working on cross-lingual NLP, this work provides a new benchmark and analysis framework, but the findings are incremental as they confirm known challenges and propose a modest improvement method.
The paper introduces a new benchmark for multi-target cross-lingual summarization covering 24 languages, finding that LLM performance lags behind English summarization. It proposes a layer-wise analysis showing translation and summarization emerge jointly in later layers, and introduces an activation steering method that consistently improves quality across languages.
Multi-target cross-lingual text summarization (MTXLS), which summarizes a source document into multiple target languages, is increasingly important as users consume content in diverse languages, but remains underexplored. To address this gap, we introduce multi-target cross-lingual element-aware (MEA), a new MTXLS benchmark covering 24 target languages. We benchmark end-to-end and pipeline approaches across various LLMs and show that MTXLS performance still substantially lags behind English monolingual summarization. To better understand MTXLS in LLMs, we propose a layer-wise analysis framework for investigating how LLMs internally perform MTXLS. Our analyses suggest that translation and summarization behaviors emerge jointly within later layers rather than as distinctly decomposed stages. Most task-relevant processing occurs within these layers, and errors also tend to arise at similar depths. Motivated by these findings, we introduce an inference-time activation steering method that leverages hidden representations from English summarization to guide MTXLS generation. Experiments show that our method consistently improves MTXLS quality across target languages.