Hound: Relation-First Knowledge Graphs for Complex-System Reasoning in Security Audits
This work addresses the challenge of analyzing interrelated components in complex systems for security analysts, representing an incremental improvement with specific gains in recall and F1.
Hound tackles the problem of system-level reasoning in complex codebases for security audits by introducing a relation-first graph engine and a persistent belief system, resulting in improved recall and F1 scores (micro recall 31.2% vs. 8.3%; F1 14.2% vs. 9.8%) over a baseline LLM analyzer on a subset of ScaBench.
Hound introduces a relation-first graph engine that improves system-level reasoning across interrelated components in complex codebases. The agent designs flexible, analyst-defined views with compact annotations (e.g., monetary/value flows, authentication/authorization roles, call graphs, protocol invariants) and uses them to anchor exact retrieval: for any question, it loads precisely the code that matters (often across components) so it can zoom out to system structure and zoom in to the decisive lines. A second contribution is a persistent belief system: long-lived vulnerability hypotheses whose confidence is updated as evidence accrues. The agent employs coverage-versus-intuition planning and a QA finalizer to confirm or reject hypotheses. On a five-project subset of ScaBench[1], Hound improves recall and F1 over a baseline LLM analyzer (micro recall 31.2% vs. 8.3%; F1 14.2% vs. 9.8%) with a modest precision trade-off. We attribute these gains to flexible, relation-first graphs that extend model understanding beyond call/dataflow to abstract aspects, plus the hypothesis-centric loop; code and artifacts are released to support reproduction.