KnowThyself: An Agentic Assistant for LLM Interpretability
This work addresses the need for more accessible and extensible interpretability tools for LLM users, though it appears incremental as it builds on existing capabilities by integrating them into a conversational workflow.
The authors tackled the problem of fragmented and code-intensive tools for large language model (LLM) interpretability by developing KnowThyself, an agentic assistant that consolidates capabilities into a chat-based interface, resulting in a platform that lowers technical barriers and provides interactive visualizations with guided explanations.
We develop KnowThyself, an agentic assistant that advances large language model (LLM) interpretability. Existing tools provide useful insights but remain fragmented and code-intensive. KnowThyself consolidates these capabilities into a chat-based interface, where users can upload models, pose natural language questions, and obtain interactive visualizations with guided explanations. At its core, an orchestrator LLM first reformulates user queries, an agent router further directs them to specialized modules, and the outputs are finally contextualized into coherent explanations. This design lowers technical barriers and provides an extensible platform for LLM inspection. By embedding the whole process into a conversational workflow, KnowThyself offers a robust foundation for accessible LLM interpretability.