Using four different online media sources to forecast the crude oil price
This work addresses financial forecasting for economists and traders by incrementally combining more media sources and language dimensions than previous studies.
The study tackled forecasting crude oil prices by analyzing signals from four online media sources (Twitter, Google Trends, Wikipedia, GDELT) over two years, finding that combined analysis provides valuable predictive information, with Twitter language complexity, GDELT article count, and Wikipedia page reads showing the highest predictive power.
This study looks for signals of economic awareness on online social media and tests their significance in economic predictions. The study analyses, over a period of two years, the relationship between the West Texas Intermediate daily crude oil price and multiple predictors extracted from Twitter, Google Trends, Wikipedia, and the Global Data on Events, Language, and Tone database (GDELT). Semantic analysis is applied to study the sentiment, emotionality and complexity of the language used. Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) models are used to make predictions and to confirm the value of the study variables. Results show that the combined analysis of the four media platforms carries valuable information in making financial forecasting. Twitter language complexity, GDELT number of articles and Wikipedia page reads have the highest predictive power. This study also allows a comparison of the different fore-sighting abilities of each platform, in terms of how many days ahead a platform can predict a price movement before it happens. In comparison with previous work, more media sources and more dimensions of the interaction and of the language used are combined in a joint analysis.