Automatic Knowledge Extraction with Human Interface
This work addresses the challenge of using unintuitive NLP software for modeling systems from documentation, specifically for cyber domain applications, but it appears incremental as it leverages existing open-source tools.
The authors tackled the problem of extracting knowledge from legacy system documentation by developing OrbWeaver, an automatic system with a human interface, which improved knowledge extraction by revealing hidden relationships and linking entities in a cyber threat corpus.
OrbWeaver, an automatic knowledge extraction system paired with a human interface, streamlines the use of unintuitive natural language processing software for modeling systems from their documentation. OrbWeaver enables the indirect transfer of knowledge about legacy systems by leveraging open source tools in document understanding and processing as well as using web based user interface constructs. By design, OrbWeaver is scalable, extensible, and usable; we demonstrate its utility by evaluating its performance in processing a corpus of documents related to advanced persistent threats in the cyber domain. The results indicate better knowledge extraction by revealing hidden relationships, linking co-related entities, and gathering evidence.