CLApr 15

Cognitive-Linguistic Indicators of Depression in Online Communities: Analysed by DistilBERT and Holographic Reduced Representation

arXiv:2606.0002616.6h-index: 4
Predicted impact top 52% in CL · last 90 daysOriginality Incremental advance
AI Analysis

For researchers in computational psychiatry, this hybrid approach offers a modest improvement over existing text-based depression detection methods.

The study combines DistilBERT embeddings with Holographic Reduced Representation vectors encoding cognitive-linguistic features to detect depression in Reddit posts, achieving a macro F1 of 0.94, outperforming a TF-IDF baseline (0.80).

This paper investigates whether combining cognitively grounded linguistic features with transformer-based embeddings improves automated detection of depression in online text. Using Beck's Cognitive Theory of Depression, the study extracts cognitive distortions as measurable features, including first-person pronoun density, absolutist words, and negative emotion in Reddit posts from depression-related and control communities. Using a subset of the Kaggle Reddit Suicide and Depression Detection dataset, two classification pipelines are compared, a TF-IDF embedding with Naive Bayes as a baseline, and a hybrid model that concatenates DistilBERT sentence embeddings with Holographic Reduced Representation (HRR) vectors encoding the cognitive-linguistic features, followed by Logistic Regression. The hybrid DistilBERT HRR model achieves a macro F1 score of 0.94 versus 0.80 for the TD-IDF baseline, with 5-fold cross validation F1 improving from 0.83 to 0.92, and AUC from 0.958 to 0.981.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes