NIAIOct 11, 2022

Client Error Clustering Approaches in Content Delivery Networks (CDN)

arXiv:2210.05314v11 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This work addresses log analysis for CDN operators, but it is incremental as it applies existing clustering techniques to a new dataset.

The study tackled the challenge of analyzing billions of CDN proxy logs by applying clustering methods to identify recurring errors, showing that this approach is viable for improving service quality.

Content delivery networks (CDNs) are the backbone of the Internet and are key in delivering high quality video on demand (VoD), web content and file services to billions of users. CDNs usually consist of hierarchically organized content servers positioned as close to the customers as possible. CDN operators face a significant challenge when analyzing billions of web server and proxy logs generated by their systems. The main objective of this study was to analyze the applicability of various clustering methods in CDN error log analysis. We worked with real-life CDN proxy logs, identified key features included in the logs (e.g., content type, HTTP status code, time-of-day, host) and clustered the log lines corresponding to different host types offering live TV, video on demand, file caching and web content. Our experiments were run on a dataset consisting of proxy logs collected over a 7-day period from a single, physical CDN server running multiple types of services (VoD, live TV, file). The dataset consisted of 2.2 billion log lines. Our analysis showed that CDN error clustering is a viable approach towards identifying recurring errors and improving overall quality of service.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes