Scaling tabular data of training datasets with neural networks

doi:10.26565/2304-6201-2023-59-07

Dmytro Uzlov V. N. Karazin Kharkiv National University, Svobody Sq., 4, Kharkiv, Ukraine, 61022 https://orcid.org/0000-0003-3308-424X
Anastasiia Morozova V. N. Karazin Kharkiv National University, Svobody Sq., 4, Kharkiv, Ukraine, 61022 https://orcid.org/0000-0003-2143-7992
Victoriya Kuznietcova V. N. Karazin Kharkiv National University, Svobody Sq., 4, Kharkiv, Ukraine, 61022 https://orcid.org/0000-0003-3882-1333
Kyrylo Rukkas V. N. Karazin Kharkiv National University, Svobody Sq., 4, Kharkiv, Ukraine, 61022 https://orcid.org/0000-0002-7614-0793

DOI: https://doi.org/10.26565/2304-6201-2023-59-07

Keywords: neural networks, database, tabular data, data augmentation, training dataset, artificial intelligence, deep learning

Abstract

The paper proposes a method of scaling the tabular data of the training dataset using neural networks, describes the architecture of such networks.

Relevance. Presently, there is a problem of insufficient amount of raw data for training artificial intelligence models, which leads to significant modeling error. The work is devoted to the development of approaches to the generation of artificial tabular data, which can be used in the future for artificial intelligence models.

Goal. The purpose of the work was to analyze methods and algorithms for scaling the training dataset for tabular data using neural networks.

Research methods. The main research method is the process of selecting the parameters of the artificial data generation algorithm and choosing the optimal parameters of the neural network architecture.

The results. Using neural networks for scaling the tabular data of the training dataset confirmed the efficiency of the proposed approach. The results of the algorithm adjustment and the selection of the optimal parameters of the neural network showed that the generated artificial data most resemble the initial ones in terms of the criteria of average value, maximum, minimum and dependence between data.

Conclusions. The task of scaling the tabular data of the training dataset using neural networks has been solved. This approach makes it possible to significantly simplify the process of learning neural networks. The scientific novelty of this work lies in the development of approaches and methods for increasing tabular data using artificial intelligence and deep learning.

Downloads

Author Biographies

Dmytro Uzlov, V. N. Karazin Kharkiv National University, Svobody Sq., 4, Kharkiv, Ukraine, 61022

Doctor of Philosophy, Associate professor of theoretical and applied computer science department

Anastasiia Morozova, V. N. Karazin Kharkiv National University, Svobody Sq., 4, Kharkiv, Ukraine, 61022

Doctor of Philosophy, Senior lecturer of theoretical and applied computer science department

Victoriya Kuznietcova, V. N. Karazin Kharkiv National University, Svobody Sq., 4, Kharkiv, Ukraine, 61022

Doctor of Philosophy, Associate professor of higher mathematics and computer sciences department

Kyrylo Rukkas, V. N. Karazin Kharkiv National University, Svobody Sq., 4, Kharkiv, Ukraine, 61022

Doctor of Technical Sciences, Associate professor, Professor of theoretical and applied computer science department

References

/

References

Abinaya Mahendiran, Vedanth Subramaniam. Data Augmentation Techniques for Tabular Data. Mphasis. https://www.mphasis.com/content/dam/mphasis-com/global/en/home/innovation/next-lab/Mphasis_Data-Augmentation-for-Tabular-Data_Whitepaper.pdf

Luis Perez, Jason Wang. The Effectiveness of Data Augmentation in Image Classification using Deep Learning. arXiv:1712.04621, 2017. https://arxiv.org/pdf/1712.04621

Shorten, C., Khoshgoftaar, T.M. & Furht, B. Text Data Augmentation for Deep Learning. J Big Data 8, 101 (2021). https://doi.org/10.1186/s40537-021-00492-0

Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le. Unsupervised Data Augmentation for Consistency Training. arXiv:1904.12848v6, 2020. https://arxiv.org/pdf/1904.12848v6

E. Jannik Bjerrum. SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules. ArXive-prints, Mar. 2017

Alhassan Mumuni, Fuseini Mumuni. Data augmentation: A comprehensive survey of modern approaches. https://doi.org/10.1016/j.array.2022.100258

Ioffe S., Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. International conference on machine learning, PMLR (2015), pp. 448-456

Agnieszka Mikolajczyk, Michal Grochowski. Data augmentation for improving deep learning in image classification problem. 2018 International Interdisciplinary PhD Workshop (IIPhDW). DOI:10.1109/IIPHDW.2018.8388338

https://github.com/lschmiddey/deep_tabular_augmentation

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

https://docs.synthetic.ydata.ai/1.4