Modern methods of natural language processing
Keywords:
Natural Language Processing, text analysis, text processing, sentiment analysis, classification, neural network, data mining
Abstract
The main challenges of natural language processing have been covered in the article. The main processing tasks, methods, tools and libraries presently available have been analyzed. Two experiments have been carried out, where these techniques have been used to solve real-life problems, namely, the analysis of the internet news concerning some cryptocurrencies to see if the sentiment of those news correlated with the prices of the cryptocurrencies and the extraction of facts from the various press-releases to find out the companies with established partnerships. It has been shown that natural language processing is a very important and powerful tool in the modern age.
Downloads
Download data is not yet available.
References
Обработка естественного языка – Режим доступа: https://ru.wikipedia.org/wiki/Обработка_естественного_языка
DeepDive Tutorial. Extracting mentions of spouses from the news – Режим доступа: http://deepdive.stanford.edu/example-spouse
Анализ_тональности_текста – Режим доступа: https://ru.wikipedia.org/wiki/Анализ_тональности_текста
John S. Ball Using NLU in Context for Question Answering: Improving on Facebook's bAbI Tasks –ARXIV, Электронная версия печ. публикации arXiv:1709.04558, 09/2017 – PDF формат, версия 2 – Режим доступа: https://arxiv.org/ftp/arxiv/papers/1709/1709.04558.pdf
Neural Machine Translation (seq2seq) Tutorial – Режим доступа: https://www.tensorflow.org/tutorials/seq2seq
Word2vec – Режим доступа: https://ru.wikipedia.org/wiki/Word2vec
LSTM – сети долгой краткосрочной памяти – Режим доступа: https://habrahabr.ru/company/wunderfund/blog/331310/
Cloudera Broadens its Collaboration with Thorn to Include Software and Services to Fight Child Sexual Exploitation – Режим доступа: https://www.cloudera.com/more/news-and-blogs/press-releases/2016-09-28-cloudera-broadens-its-donation-to-thorn-to-include-software-services-fight-child-sexual-exploitation.html
Crunchbase – Режим доступа: https://www.crunchbase.com/
Wikipedia – Режим доступа: https://www.wikipedia.org/
Knowledge – Inside Search – Google - Режим доступа: https://www.google.com/intl/bn/insidesearch/features/search/knowledge.html
Stanford CoreNLP – Режим доступа: https://stanfordnlp.github.io/CoreNLP/
Natural Language Toolkit – Режим доступа: http://www.nltk.org/
Creating a module for Sentiment Analysis with NLTK – Режим доступа: https://pythonprogramming.net/sentiment-analysis-module-nltk-tutorial/
TensorFlow – Режим доступа: https://www.tensorflow.org/
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin Attention Is All You Need –ARXIV, Электронная версия печ. публикации arXiv:1706.03762, 06/2017 – PDF формат, версия 5 – Режим доступа: https://arxiv.org/pdf/1706.03762.pdf
Zhang, X. Character-level convolutional networks for text classification / Xiang Zhang, Junbo Zhao, Yann LeCun // In Advances in Neural Information Processing Systems. — 2015. — Feb. — 649 - 657 p.
Andrej Karpathy The Unreasonable Effectiveness of Recurrent Neural Networks – 04/2015– Режим доступа: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
DeepDive Tutorial. Extracting mentions of spouses from the news – Режим доступа: http://deepdive.stanford.edu/example-spouse
Анализ_тональности_текста – Режим доступа: https://ru.wikipedia.org/wiki/Анализ_тональности_текста
John S. Ball Using NLU in Context for Question Answering: Improving on Facebook's bAbI Tasks –ARXIV, Электронная версия печ. публикации arXiv:1709.04558, 09/2017 – PDF формат, версия 2 – Режим доступа: https://arxiv.org/ftp/arxiv/papers/1709/1709.04558.pdf
Neural Machine Translation (seq2seq) Tutorial – Режим доступа: https://www.tensorflow.org/tutorials/seq2seq
Word2vec – Режим доступа: https://ru.wikipedia.org/wiki/Word2vec
LSTM – сети долгой краткосрочной памяти – Режим доступа: https://habrahabr.ru/company/wunderfund/blog/331310/
Cloudera Broadens its Collaboration with Thorn to Include Software and Services to Fight Child Sexual Exploitation – Режим доступа: https://www.cloudera.com/more/news-and-blogs/press-releases/2016-09-28-cloudera-broadens-its-donation-to-thorn-to-include-software-services-fight-child-sexual-exploitation.html
Crunchbase – Режим доступа: https://www.crunchbase.com/
Wikipedia – Режим доступа: https://www.wikipedia.org/
Knowledge – Inside Search – Google - Режим доступа: https://www.google.com/intl/bn/insidesearch/features/search/knowledge.html
Stanford CoreNLP – Режим доступа: https://stanfordnlp.github.io/CoreNLP/
Natural Language Toolkit – Режим доступа: http://www.nltk.org/
Creating a module for Sentiment Analysis with NLTK – Режим доступа: https://pythonprogramming.net/sentiment-analysis-module-nltk-tutorial/
TensorFlow – Режим доступа: https://www.tensorflow.org/
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin Attention Is All You Need –ARXIV, Электронная версия печ. публикации arXiv:1706.03762, 06/2017 – PDF формат, версия 5 – Режим доступа: https://arxiv.org/pdf/1706.03762.pdf
Zhang, X. Character-level convolutional networks for text classification / Xiang Zhang, Junbo Zhao, Yann LeCun // In Advances in Neural Information Processing Systems. — 2015. — Feb. — 649 - 657 p.
Andrej Karpathy The Unreasonable Effectiveness of Recurrent Neural Networks – 04/2015– Режим доступа: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Published
2017-12-22
How to Cite
Близнюк, Б. О., Васильева, Л. В., Стрельников, І. Д., & Ткачук, Д. С. (2017). Modern methods of natural language processing. Bulletin of V.N. Karazin Kharkiv National University, Series «Mathematical Modeling. Information Technology. Automated Control Systems», 36, 14-26. Retrieved from https://periodicals.karazin.ua/mia/article/view/10084
Issue
Section
Статті