Beyond Bags of Words: Inferring Systemic Nets
This enables new practical analysis techniques for textual analytics, addressing limitations of bag-of-words models for deeper insights into language, authors, and contexts.
The paper tackled the problem of needing richer document representations for deeper textual analysis by algorithmically inferring systemic nets from corpora, showing that the resulting nets are plausible and provide practical benefits for knowledge discovery.
Textual analytics based on representations of documents as bags of words have been reasonably successful. However, analysis that requires deeper insight into language, into author properties, or into the contexts in which documents were created requires a richer representation. Systemic nets are one such representation. They have not been extensively used because they required human effort to construct. We show that systemic nets can be algorithmically inferred from corpora, that the resulting nets are plausible, and that they can provide practical benefits for knowledge discovery problems. This opens up a new class of practical analysis techniques for textual analytics.