CLJun 17, 2024

Gram2Vec: An Interpretable Document Vectorizer

Peter Zeng, Hannah Stortz, Eric Sclafani, Alina Shabaeva, Maria Elizabeth Garza, Daniel Greeson, Owen Rambow

arXiv:2406.12131v31.0

Originality Incremental advance

AI Analysis

This provides an interpretable alternative to neural methods for document analysis, with applications in authorship verification and AI detection, though it appears incremental as it builds on existing grammatical feature approaches.

The authors tackled the problem of document representation by introducing Gram2Vec, a grammatical style embedding system that embeds documents based on normalized relative frequencies of grammatical features, offering inherent interpretability. They applied it to authorship verification and AI detection, where it outperformed models using Biber features for AI detection.

We present Gram2Vec, a grammatical style embedding system that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches, Gram2Vec offers inherent interpretability based on how the feature vectors are generated. In this paper, we use authorship verification and AI detection as two applications to show how Gram2Vec can be used. For authorship verification, we use the features from Gram2Vec to explain why a pair of documents is by the same or by different authors. We also demonstrate how Gram2Vec features can be used to train a classifier for AI detection, outperforming machine learning models trained on a comparable set of Biber features.

View on arXiv PDF

Similar