Bulletin of V.N. Karazin Kharkiv National University, series «Mathematical modeling. Information technology. Automated control systems»

Analysis of clustering algorithms for product recommendations

Nina Bakumenko — 2024-05-27

Relevance. In today's world, where a wide range of goods and services are available, the task of providing personalized recommendations for selecting the right one is becoming an increasingly important in many areas, including e-commerce and online platforms. Expert recommendation systems powered by search and clustering algorithms have the potential to significantly improve the user experience by offering relevant and personalized product suggestions. One of the key advantages of using clustering algorithms for recommender systems is the ability to predict the similarity of objects based on their compliance with a certain characteristic, which makes it possible to implement an effective search for products by characteristics. As a result, it allows dividing an user base into separate subgroups that can represent different market segments, preference groups, and the target audience of certain products. Identification of problems and shortcomings of such systems helps to improve algorithms, which leads to more accurate forecasts and increased sales.

Objective. The purpose of this article is to analyze the effectiveness of using cluster analysis methods in the tasks of generating recommendations.

Research methods. Comparative analysis, experiment.

Results. The effectiveness of clustering algorithms of different types (k-means++, Mean Shift and HDBSCAN) for providing product recommendations based on the assessment of the percentage of compliance with the user's request, the use of RAM, and the query execution time has been analyzed. The k-means++ algorithm showed the best performance among the tested algorithms.

Conclusions. Our analysis confirms the effectiveness of using cluster analysis methods in recommender systems. Identification of problems and shortcomings of such systems allows improving algorithms, which leads to more accurate forecasts and increased sales of companies.

Computer modelling of liquid sloshing in tanks under periodic loads

Kirill Degtyarev — 2024-05-27

The paper aims on developing the computer methodology for taking damping into account when analysing the stability of fluid movement in reservoirs and fuel tanks under periodic external loads

Relevance. Damping plays a critical role in providing stability and reducing potential hazards in tanks partially filled with liquid. Lack of cushioning can lead to motion instability. In liquid tanks, any movement disturbances such as sudden acceleration, deceleration or turning can cause sloshing. Without damping, sloshing can even increase, potentially leading to uncontrolled and dangerous situations, especially in vehicles or at industrial processes. Damping provides control over the clapping dynamics, providing smoother and more predictable behaviour. By damping out excessive vibrations, engineers can ensure that the fluid remains stable inside the fuel tank, reducing the risk of excessive dynamic loads on the tank structure or the vehicle carrying it. Therefore, studies devoted to the study of clapping damping are relevant.

Research methods. The methods of integral equations, the method of given forms, and the method of boundary elements were used to solve the problem of damping splashes.

The results. The spectral boundary value problem was solved and the frequencies and forms of natural oscillations of the fluid were found. Combined horizontal and vertical loads were studied, and zones of stable and unstable movement were found depending on the load parameters. The effect of damping using the Rayleigh matrix was studied. The importance of the obtained results on fluid splashing in rigid tanks is to clarify the critical role of damping in providing stability and reducing potential hazards to the stability of launch vehicle fuel tanks during flight.

Conclusions. The method for determining the time-varying level of the liquid free surface in rigid shells of revolution has been developed. The spectral problem of determining the frequencies and modes of liquid oscillations in a truncated conical tank is solved by reducing it to the system of one-dimensional integral equations. With the help of the Ince-Strutt diagram, the zones of instability of fluid movement under harmonic vertical loads were found. The effect of Rayleigh damping on the growth of the free surface level has been clarified. In the future, it is planned to study the oscillations of elastic shells of rotation with liquid, using various composite materials

Analysis of the effectiveness of flow distribution optimization methods in water supply systems with a large number of pumping stations

Sergey Dyadun — 2024-05-27

Relevance. Currently, optimization methods for a small number of active sources working on the network have been studied. However, when developing operational control systems for water supply systems (WSS) of large cities, one has to deal with a large number of pumping stations (PS) simultaneously working to the network. The complexity of solving the problem of optimization of flow distribution in WSS increases with the increase in the number of active sources working together, which are variables of the optimization of the problem under consideration.

Goal. In the problem of operational control of the modes of operation of the WSS, the task of optimizing flow distribution in a large-scale water supply network occupies an important place. The purpose of the task is to distribute the load (expenditure) between the stations in such a way that, while ensuring the specified quality of water supply to all consumers, the minimum amount of energy consumption at the pumping stations is achieved. The formulation of the problem, the methods of its solution for the WSS of a large city, for which a large number of pumping stations work, are considered. It is necessary to conduct a comparative analysis of the effectiveness of the use of various optimization methods to solve the problem of optimal load distribution among a large number of pumping stations simultaneously working to the water supply system of the metropolis.

Research methods. This problem can be solved by methods of nonlinear mathematical programming or search optimization based on the hydraulic calculation of the water supply network. Its specific feature is the algorithmic task of the goal function. When working to a network of two active sources, this problem is reduced to a problem of one-dimensional search optimization. With a larger number of variables, it is necessary to use methods of multidimensional optimization. The most effective and common methods were used to study the effectiveness of solving the problem of flow distribution optimization in the WSS: coordinate descent; scanning with a variable step; deformed Nelder-Mead polyhedron; Hook and Jeeves direct search; Rosenbrock; Powell.

The results. The conducted research showed that the method of direct search of Hook and Jeeves was the most effective according to the criteria of the minimum expenditure of computer time and the amount of computer memory.

Conclusions. It is advisable to use the obtained results for the development and operation of systems for the operational management of the operation modes of the WSS of large cities, the control systems of dispatchers of water supply networks, CAD of water supply systems to determine the optimal modes of operation of the WSS.

Networks virtualization as an approach to optimization of computer networks

Oleksandr Zats — 2024-05-27

The purpose of the work is to study the existing methods of optimizing computer networks and analyze the approach of virtualization of networks as a means of optimization. The object of the work is the process of optimizing computer networks, and the subject is models, methods and information technologies that are used to optimize networks.

Research methods: simulation and mathematical modeling methods, optimization methods, control methods, neural network methods.

As a result of the work, an analysis of computer network optimization approaches and methods was carried out. Among them, optimization methods of network topology, methods of nonlinear optimization of parameters and functional dependencies, which describe the behavior and state of the network, are highlighted. It is noted that the implementation of machine learning methods in the optimization model of computer networks is promising due to their ability to generalize, classify and predict possible changes in the network structure to improve its efficiency. The focus is on a virtualization approach that allows you to abstract from network topology, optimize resource usage, improve security, simplify management, and ensure high availability. Such models can be adapted to specific requirements and constraints. Among the existing directions of virtualization, the virtualization of network functions, the construction of software-configured networks and knowledge-defined networks are considered in detail.

Conclusions: it is proposed to combine the virtualization approach with machine learning methods, namely to build a knowledge-based network optimization model based on graph neural networks. This approach will make it possible to combine the complex relationship between topology, routing and incoming network traffic and obtain accurate estimates of the distribution of delays and losses in the network.

Latent diffusion model for speech signal processing

Andrii Ivaniuk — 2024-05-27

Topicality. The development of generative models for audio synthesis, including text-to-speech (TTS), text-to-music, and text-to-audio applications, largely depends on their ability to handle complex and varied input data. This paper centers on latent diffusion modeling, a versatile approach that leverages stochastic processes to generate high-quality audio outputs.

Key goals. This study aims to evaluate the efficacy of latent diffusion modeling for TTS synthesis on the EmoV-DB dataset, which features multi-speaker recordings across five emotional states, and to contrast it with other generative techniques.

Research methods. We applied latent diffusion modeling to TTS synthesis specifically and evaluated its performance using metrics that assess intelligibility, speaker similarity, and emotion preservation in the generated audio signal.

Results. The study reveals that while the proposed model demonstrates decent efficiency in maintaining speaker characteristics, it is outperformed by the discrete autoregressive model: xTTS v2 in all assessed metrics. Notably, the researched model exhibits deficiencies in emotional classification accuracy, suggesting potential misalignment between the emotional intents encoded by the embeddings and those expressed in the speech output.

Conclusions. The findings suggest that further refinement of the encoder's ability to process and integrate emotional data could enhance the performance of the latent diffusion model. Future research should focus on optimizing the balance between speaker and emotion characteristics in TTS models to achieve a more holistic and effective synthesis of human-like speech.

The appearance of «intelligence» in self-propelled bots

Dmytro Obraztsov — 2024-05-27

Relevance. Nowadays, the study of the behavior and properties of active matter, which corresponds to the collective behavior of self-moving elements, is very promising field of research. Active matter is widespread in nature and is used in various modern technologies.

Objective. To study the collective behavior of mobile bots in a simple maze and to determine the distinctive trends in the single bots exiting from the maze as their number increases.

Research methods. To perform the research, mobile bots have been created and their behavior in the maze has been analyzed. Their positions and interactions were recorded on a video, and processed to obtain the necessary data.

Results. The research revealed the existence of an optimal number of bots in the maze for which the average time to exit the maze is minimal. The determined dependence of the probability of bots leaving the maze is non-monotonic, and there is a number of bots for which this probability is minimal. It has been determined that with 12 bots in the maze, both the average exit time and the exit probability are minimal. Thus, with this number of bots, a small number of bots quickly exit the maze. A quantitative measure of bot intelligence, the intelligence coefficient, is proposed. There exists an optimal number of bots that maximizes the measure of «intelligence» concerning the task of exiting from the maze. Both a decrease and an increase in the number of bots lead to a reduction in the «intelligence» of the bot collective. The measure of the «intelligence» of a bot collective surpasses that of an individual bot.

Conclusions. In this work, we have considered the groups of self-moving bots exiting the maze and the typical quantitative characteristics that allowed us to determine the main dependencies of their behavior.

Assessing the utility of a public dataset for analytical research

Oksana Podoliaka — 2024-05-27

Organizations and agencies release various data intended for analysis, training of artificial intelligence systems, and other research purposes. According to the adopted regulations in the field of personal data protection, public data must be anonymized and protected from various threats of personal data disclosure. Elimination of these threats is realized by reducing the accuracy of data during their preparation for the release. Loss of accuracy obviously leads to a decrease in the usefulness of data for analysis. The paper considers entropy metrics of utility and problems of their computability, as well as metrics of loss of utility of certain subsets of public data.

Objective. To develop effective metrics for assessing the usefulness of a public dataset for analysis, taking into account the requirements of personal data protection.

Research methods. Information security, Shannon's theory of information, Data Governance.

Results. Metrics for assessing information loss and data usefulness for analysis based on the entropy metrics of Shannon's information theory are proposed. Procedures aimed at increasing the speed of calculations of the considered metrics are suggested.

Conclusions. The procedures for building a secure public dataset are described. The application of entropy metrics of Shannon's information theory to assess information loss and data usefulness for analysis is considered. It has been shown that the calculation of these metrics is a complex computational task that is practically impossible for large databases. Procedures aimed at increasing the speed of calculating the considered metrics are proposed. In particular, the creation of a less accurate copy of the original data and the formation of a random sample from a large database to calculate the necessary statistics. The metrics for assessing the usefulness of certain subsets (clusters) of public data are considered in the article.