Agentic DraCor and the Art of Docstring Engineering: Evaluating MCP-empowered LLM Usage of the DraCor API
This work addresses the need for reliable agentic AI in Computational Literary Studies and Digital Humanities, though it is incremental as it builds on existing MCP and API infrastructure.
The paper tackled the problem of enabling LLMs to autonomously interact with the DraCor API by implementing a Model Context Protocol server, and found that docstring engineering is crucial for optimizing LLM-tool interaction, with evaluations focusing on tool correctness, efficiency, and reliability.
This paper reports on the implementation and evaluation of a Model Context Protocol (MCP) server for DraCor, enabling Large Language Models (LLM) to autonomously interact with the DraCor API. We conducted experiments focusing on tool selection and application by the LLM, employing a qualitative approach that includes systematic observation of prompts to understand how LLMs behave when using MCP tools, evaluating "Tool Correctness", "Tool-Calling Efficiency", and "Tool-Use Reliability". Our findings highlight the importance of "Docstring Engineering", defined as reflexively crafting tool documentation to optimize LLM-tool interaction. Our experiments demonstrate both the promise of agentic AI for research in Computational Literary Studies and the essential infrastructure development needs for reliable Digital Humanities infrastructures.