CLSISTJun 17, 2020

A Tweet-based Dataset for Company-Level Stock Return Prediction

arXiv:2006.09723v15 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides a dataset for researchers and practitioners to analyze tweet-based impact on stock returns, though it is incremental as it builds on existing work in financial NLP.

The authors tackled the problem of predicting company-level stock returns from tweets by creating a dataset of 862,231 labelled instances and a cleaned subset of 85,176 instances, with baselines showing competitive performance using standard machine learning and multi-view learning approaches.

Public opinion influences events, especially related to stock market movement, in which a subtle hint can influence the local outcome of the market. In this paper, we present a dataset that allows for company-level analysis of tweet based impact on one-, two-, three-, and seven-day stock returns. Our dataset consists of 862, 231 labelled instances from twitter in English, we also release a cleaned subset of 85, 176 labelled instances to the community. We also provide baselines using standard machine learning algorithms and a multi-view learning based approach that makes use of different types of features. Our dataset, scripts and models are publicly available at: https://github.com/ImperialNLP/stockreturnpred.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes