Development of a neural network model to resolve homograph ambiguity in text data

doi:10.26565/2304-6201-2024-62-03

Yehor Yehorov V.N. Karazin Kharkiv National University, Svobody Square, 4, Kharkiv-22, Ukraine, 61022 https://orcid.org/0000-0001-8731-7198
Segii Shmatkov V.N. Karazin Kharkiv National University, Svobody Square, 4, Kharkiv-22, Ukraine, 61022 https://orcid.org/0000-0002-0298-7174

DOI: https://doi.org/10.26565/2304-6201-2024-62-03

Keywords: neural networks, homograph, model, embedding, PYTHON, LSTM

Abstract

This paper explores the creation of a neural network model aimed at resolving the ambiguity of homographs in textual data. Various neural network types are scrutinized for their potential in addressing this challenge. The paper delves into the methodological aspects of neural network architecture, encompassing analysis, design, implementation, testing, evaluation, and optimization. Each stage of this process is underlined for its significance, stressing the importance of a thorough understanding of neural network types and their applications, as well as judicious technology selection. The utility of the developed model extends to domains utilizing automatic language recognition, text-based decision support, enhancement of search systems, and natural language processing. Researchers and practitioners in the field of natural language processing and text data classification will find this paper valuable.

Relevance: in today's world, the development of a neural network model to resolve the ambiguity of homographs in textual data is determined by the challenges facing the field of natural language processing. This model is able to correct errors associated with incorrect understanding of the meanings of words in the text, which will ensure greater accuracy and quality of the analysis of the text material. Its use is possible in various areas, including automatic language detection, decision support based on text information, improvement of search systems and data classification.

The goal: to improve the quality of text processing, in particular, to improve the accuracy of recognition of words that have more than one meaning, through the development and implementation of a neural network that will be able to resolve homograph ambiguities in text data in real time.

Research methods: system analysis, deep learning methods, neural network theory, data processing and preparation methods, simulation modeling were used to study the selected area. The software is developed using the Python language and uses the sklearn, keras and other packages.

Results: the main result of the work is the development of a neural network model that eliminates homograph ambiguities in text data in real time, which makes it possible to expand it for the other languages.

Conclusions: the problem of ambiguity of homographs in textual data has been considered. For this natural language processing task, a neural network model with long short-term memory was developed using embedding models, LSTM layer, and fully connected layers. The study proves the importance of innovative approaches in solving homograph ambiguity problems in textual data, and that the use of neural networks and artificial intelligence technologies becomes a promising direction for further research and implementation in this area.

Downloads

Download data is not yet available.

Author Biographies

Yehor Yehorov, V.N. Karazin Kharkiv National University, Svobody Square, 4, Kharkiv-22, Ukraine, 61022

Master student

Segii Shmatkov, V.N. Karazin Kharkiv National University, Svobody Square, 4, Kharkiv-22, Ukraine, 61022

Professor

References

/