Modern methods of natural language processing

  • Богдан Олегович Близнюк
  • Лариса Валентиновна Васильева
  • Ілья Дмитриевич Стрельников
  • Дмитрий Сергеевич Ткачук
Keywords: Natural Language Processing, text analysis, text processing, sentiment analysis, classification, neural network, data mining

Abstract

The main challenges of natural language processing have been covered in the article. The main processing tasks, methods, tools and libraries presently available have been analyzed. Two experiments have been carried out, where these techniques have been used to solve real-life problems, namely, the analysis of the internet news concerning some cryptocurrencies to see if the sentiment of those news correlated with the prices of the cryptocurrencies and the extraction of facts from the various press-releases to find out the companies with established partnerships. It has been shown that natural language processing is a very important and powerful tool in the modern age.

Downloads

Download data is not yet available.

References

Обработка естественного языка – Режим доступа: https://ru.wikipedia.org/wiki/Обработка_естественного_языка

DeepDive Tutorial. Extracting mentions of spouses from the news – Режим доступа: http://deepdive.stanford.edu/example-spouse

Анализ_тональности_текста – Режим доступа: https://ru.wikipedia.org/wiki/Анализ_тональности_текста

John S. Ball Using NLU in Context for Question Answering: Improving on Facebook's bAbI Tasks –ARXIV, Электронная версия печ. публикации arXiv:1709.04558, 09/2017 – PDF формат, версия 2 – Режим доступа: https://arxiv.org/ftp/arxiv/papers/1709/1709.04558.pdf

Neural Machine Translation (seq2seq) Tutorial – Режим доступа: https://www.tensorflow.org/tutorials/seq2seq

Word2vec – Режим доступа: https://ru.wikipedia.org/wiki/Word2vec

LSTM – сети долгой краткосрочной памяти – Режим доступа: https://habrahabr.ru/company/wunderfund/blog/331310/

Cloudera Broadens its Collaboration with Thorn to Include Software and Services to Fight Child Sexual Exploitation – Режим доступа: https://www.cloudera.com/more/news-and-blogs/press-releases/2016-09-28-cloudera-broadens-its-donation-to-thorn-to-include-software-services-fight-child-sexual-exploitation.html

Crunchbase – Режим доступа: https://www.crunchbase.com/

Wikipedia – Режим доступа: https://www.wikipedia.org/

Knowledge – Inside Search – Google - Режим доступа: https://www.google.com/intl/bn/insidesearch/features/search/knowledge.html

Stanford CoreNLP – Режим доступа: https://stanfordnlp.github.io/CoreNLP/

Natural Language Toolkit – Режим доступа: http://www.nltk.org/

Creating a module for Sentiment Analysis with NLTK – Режим доступа: https://pythonprogramming.net/sentiment-analysis-module-nltk-tutorial/

TensorFlow – Режим доступа: https://www.tensorflow.org/

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin Attention Is All You Need –ARXIV, Электронная версия печ. публикации arXiv:1706.03762, 06/2017 – PDF формат, версия 5 – Режим доступа: https://arxiv.org/pdf/1706.03762.pdf

Zhang, X. Character-level convolutional networks for text classification / Xiang Zhang, Junbo Zhao, Yann LeCun // In Advances in Neural Information Processing Systems. — 2015. — Feb. — 649 - 657 p.

Andrej Karpathy The Unreasonable Effectiveness of Recurrent Neural Networks – 04/2015– Режим доступа: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Published
2017-12-22
How to Cite
Близнюк, Б. О., Васильева, Л. В., Стрельников, І. Д., & Ткачук, Д. С. (2017). Modern methods of natural language processing. Bulletin of V.N. Karazin Kharkiv National University, Series «Mathematical Modeling. Information Technology. Automated Control Systems», 36, 14-26. Retrieved from https://periodicals.karazin.ua/mia/article/view/10084
Section
Статті