Method for generating source code description using an artificial intelligence model

  • Albina Kostiuchenko National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", 03056, Ukraine, Kyiv, Polytechnichna St., 14-a https://orcid.org/0009-0004-7382-7209
  • Andrii Petrashenko National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", 03056, Ukraine, Kyiv, Polytechnichna St., 14-a https://orcid.org/0000-0003-0239-1706
Keywords: machine learning, T5, GNN, code description generator, model training, natural language processing, documentation, AST

Abstract

Relevance. The topic is relevant, since currently there are many large projects that are being developed over a long period of time and require support and understanding of the code without explanations. The rapid development of technologies and the need to constantly develop new features and support existing ones require constant updating of documentation. Writing good documentation is a valuable skill that requires experience, concentration and understanding of the project structure. As a result, a large number of developers consider the process of writing documentation difficult and think that the time spent on it could be used more productively. That is why there is a demand for services that help automate this process.

Goal. The purpose of this work is to increase the efficiency of automated generation of software documentation. As part of this task, the necessary theoretical material was worked out, existing solutions to this problem were studied, and our own new method of generating a description of the program code was developed and implemented, which more accurately determined the purpose of code fragments, clearly understood the structure and dependencies between its components.

Research methods. The study is based on literature analysis, statistical methods, as well as machine learning and data mining methods. In particular, the methods of syntactic code analysis and construction of an abstract syntax tree (AST), the method of forming a training corpus, methods of training and retraining of transformer and graph models were used. To assess the advantages of the retrained model, the method of comparative modeling and automated text quality assessment (in this case, BERTScore) was used.

The results. Retraining the T5 model on a specialized dataset with commented code in combination with lexical analysis allowed to increase the quality of generation by approximately 4% in terms of the F1 metric compared to the base model. This indicates that adapting the model to a specific domain task is effective and can significantly improve the result.

Conclusions. Based on the collected data, an own approach was proposed to improve the quality of code description generation using the retrained T5 model and the created GNN model with further implementation, which is the result of the research. The proposed system combines the best practices of syntactic analysis, graph modeling, and transformer generation, providing a practically applicable solution for automatic documentation creation. It can be argued that the combination of "seq2seq" models, tokenization and adaptation methods of large transformers, as well as code analysis via GNN and structural AST representations provides a comprehensive approach to automating work with code, allowing you to combine local and global contexts, quickly adapt the model to specific tasks, and effectively generate meaningful comments and documentation. Such an integrated approach has the potential for further development of artificial intelligence systems in the field of automatic code analysis, increasing developer productivity, and ensuring software quality. The research results can be applied in practice for fast and effective creation of documentation for developed software and large projects in the Python language.

Downloads

Download data is not yet available.

Author Biographies

Albina Kostiuchenko, National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", 03056, Ukraine, Kyiv, Polytechnichna St., 14-a

Master's student of the Department of System Programming and Specialized Computer Systems

Andrii Petrashenko, National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", 03056, Ukraine, Kyiv, Polytechnichna St., 14-a

Ph.D., Associate Professor of the Department of System Programming and Specialized Computer Systems

References

/

References

Published
2025-12-22
How to Cite
Kostiuchenko, A., & Petrashenko, A. (2025). Method for generating source code description using an artificial intelligence model. Bulletin of V.N. Karazin Kharkiv National University, Series «Mathematical Modeling. Information Technology. Automated Control Systems», 68, 30-42. https://doi.org/10.26565/2304-6201-2025-68-03
Section
Статті