Analyzing HPC Support Tickets: Experience and Recommendations
This addresses the problem of improving efficiency for HPC user support teams, but it is incremental as it builds on existing ticketing systems with new tools.
The work analyzed HPC support tickets at Los Alamos National Laboratory to develop proof-of-concept tools for automating category assignment and similar ticket recommendations, aiming to assist support teams in solving user problems and tracking issue trends.
High performance computing (HPC) user support teams are the first line of defense against large-scale problems, as they are often the first to learn of problems reported by users. Developing tools to better assist support teams in solving user problems and tracking issue trends is critical for maintaining system health. Our work examines the Los Alamos National Laboratory HPC Consult Team's user support ticketing system and develops proof of concept tools to automate tasks such as category assignment and similar ticket recommendation. We also generate new categories for reporting and discuss ideas to improve future ticketing systems.