Economic series prediction basing on internet users sentiment analysis
Abstract
A set of economic time series forecasting models (based on objective and subjective internet content analysis) is proposed in the article. The Bitcoin (BTC) system, the first and most popular cryptocurrency today, was chosen for the analysis. Exponential increase of the cryptocurrency market stipulates relevance of BTC rate forecasting in recent years. Theoretical framework of the research is based on the behavioral finance concept supposing that traders’ behavior is irrational, and the character of their decisions largely depends on psychological factors. The aim of the research is to develop currency rate forecasting methodology and a set of models based on the analysis of factual (objective) and conceptual (subjective) internet content. The source of factual information is relevant newsfeed of specialized news portals, the source of conceptual information is users’ records in microblogs. The basis of the models’ set includes: 1) parsing scripts of news portals and microblogs; 2) algorithm of factual and conceptual exogenous variables generation on the basis of latent-semantic analysis, sentiment analysis, Granger causality analysis; 3) chosen mathematic forecasting tools such as feedforward neural networks and recurrent networks with long short-term memory (LSTM) the set of inputs of which was formed by applying genetic algorithms. As a result of news feeds and tweets database processing, the set of exogenous factors including four out of fourteen factual variables (infrastructure, activity, dissemination and expect) and two out of eight conceptual ones (calm and confusion) was worked out. Automation of neural networks architecture optimization was conducted with the use of genetic algorithms: chromosome length equaled the number of variables; species were subject to hybridization, mutation and selection based on their fit function, which is MSE for the validation dataset. Comparative analysis of different neural networks architectures allowed proving the expediency of Internet content application for economic time series forecasting and demonstrated high appropriateness of the developed models.
Downloads
References
Zhang X., Fuehres H., Gloor P. Predicting Stock Market Indicators Through Twitter “I hope it is not as bad as I fear” [Електронний ресурс]. – Режим доступу : doi:10.1016/j.sbspro.2011.10.562
Lachanski M. Did Twitter “Calm”-ness Really Predict the DJIA? [Електронний ресурс]. – Режим доступу : http://files.meetup.com/7616132/DC-NLP-2015-07%20Michael%20Lachanski.pdf
Mao Y., Wei W., Wang B. Twitter Volume Spikes: Analysis and Application in Stock Trading [Електронний ресурс]. – Режим доступу : http://nlab.engr.uconn.edu/papers/SNAKDD025.pdf
Preis T., Moat H., Stanley E. Quantifying Trading Behavior in Financial Markets Using Google Trends [Електронний ресурс]. – Режим доступу : http://www.nature.com/articles/srep01684
Ruiz E. J., Hristidis,V., Castillo C., Gionis A., Jaimes, A. Correlating Financial Time Series with Micro Blogging Activity [Електронний ресурс]. – Режим доступу : http://www.cs.ucr.edu/~vagelis/publications/wsdm2012-microblog-financial.pdf
Challet, D., Ayed, A. Predicting financial markets with Google Trends and not so random keyword [Електронний ресурс]. – Режим доступу : https://arxiv.org/pdf/1307.4643v3.pdf
Blockchain [Електронний ресурс]. – Режим доступу : http://blockchain.info
Бейкер К., Нофсингер Дж. Поведенческие финансы. Инвесторы, компании, рынки. Маросейка, 2016.
Coindesk [Електронний ресурс]. – Режим доступу : http://www.coindesk.com/
Алгоритм Портера [Електронний ресурс]. – Режим доступу : http://www.cs.toronto.edu/~frank/csc2501/Readings/R2_Porter/Porter-1980.pdf
Метод ієрархічної агломеративної кластеризації [Електронний ресурс]. – Режим доступу : http://www.mathworks.com/help/stats/hierarchical-clustering.html
Bollen J., Mao H., Zeng X. Twitter mood predicts the stock market [Електронний ресурс]. – Режим доступу : http://arxiv.org/PS_cache/arxiv/pdf/1010/1010.3003v1.pdf
Sloot D. The junk science behind the ‘Twitter Hedge Fund’ [Електронний ресурс]. – Режим доступу : http://sellthenews.tumblr.com/post/21067996377/noitdoesnot
Sharma J., Vyas A. Twitter Sentiment Analysis [Електронний ресурс]. – Режим доступу : http://www.cse.iitk.ac.in/users/cs365/2012/ submissions/jaysha/cs365/projects/report.pdf
Wilson T., Hoffmann P., Somasundaran S., Kessler J., Wiebe J. OpinionFinder: A system for subjectivity analysis [Електронний ресурс]. – Режим доступу : http://people.cs.pitt.edu/~swapna/papers/OpinionFinder-extendedabstract.pdf
Thelwall M., Buckley K., Paltoglou G., Cai D. Sentiment Strength Detection in Short Informal Text [Електронний ресурс]. – Режим доступу : http://www.scit.wlv.ac.uk/~cm1993/papers/SentiStrengthPreprint.doc
Pollock V., Cho D., Reker D, Volavka J. Profile of mood states: the factors and their psychological correlates [Електронний ресурс]. – Режим доступу : http://sci-hub.cc/10.1097/00005053-197910000-00004
Google Books. Ngram Viewer [Електронний ресурс]. – Режим доступу : https://books.google.com/ngrams
WordNet [Електронний ресурс]. – Режим доступу : http://wordnetweb.princeton.edu/perl/webwn
Olah C. Understanding LSTM Networks [Електронний ресурс]. – Режим доступу : http://colah.github.io/posts/2015-08-Understanding-LSTMs/