The CTI Echo Chamber: Fragmentation, Overlap, and Vendor Specificity in Twenty Years of Cyber Threat Reporting
For CTI practitioners and researchers, this work quantifies structural biases and source completeness in threat intelligence, enabling better evaluation of intelligence coverage.
The paper analyzes 16,096 open-source CTI reports over two decades, revealing that the CTI ecosystem is fragmented with low overlap between vendors, where a few core providers offer broad awareness but additional sources yield diminishing returns.
Despite the high volume of open-source Cyber Threat Intelligence (CTI), our understanding of long-term threat actor-victim dynamics remains fragmented due to inconsistent reporting standards and the lack of structured datasets containing comprehensive analytic information. In this paper, we present a large-scale automated analysis of open-source CTI reports spanning two decades. We develop a high-precision, LLM-based pipeline to ingest and structure 16,096 reports, extracting key entities such as attributed threat actors, motivations, victims, reporting vendors, and technical indicators (IoCs and TTPs). Our analysis quantifies the evolution of CTI information density and specialization, characterizing patterns that relate specific threat actors to motivations and victim profiles. Furthermore, we perform a meta-analysis of the CTI industry itself. We identify a fragmented ecosystem of distinct silos where vendors demonstrate significant geographic and sectoral reporting biases. Our marginal coverage analysis reveals that intelligence overlap between vendors is typically low: while a few core providers may offer broad situational awareness, additional sources yield diminishing returns. Overall, our findings characterize the structural biases inherent in the CTI ecosystem, enabling practitioners and researchers to better evaluate the completeness of their intelligence sources.