Analysis of Modern Neural Network Methods for Visual Information Processing in High-Speed UAV Navigation Systems

doi:10.26565/2304-6201-2025-68-05

Antonii Lupandin V. N. Karazin Kharkiv National University, 4 Svobody Square, Kharkiv, Ukraine, 61022 https://orcid.org/0009-0002-7591-5152
Olha Moroz V. N. Karazin Kharkiv National University, 4 Svobody Square, Kharkiv, Ukraine, 61022 https://orcid.org/0000-0002-4920-4093

DOI: https://doi.org/10.26565/2304-6201-2025-68-05

Keywords: UAV, high-speed navigation, CNN, Vision Transformer, SLAM, Reinforcement Learning, edge computing, Jetson, TensorRT, quantization, pruning, hybrid architectures

Abstract

Relevance. The rapid evolution of Unmanned Aerial Vehicles (UAVs) from remotely piloted systems to fully autonomous high-speed aerial robots has intensified the demand for advanced onboard perception and navigation methods. This need is particularly acute in scenarios where computational latency, sensor noise, and environmental complexity undermine the reliability of classical computer-vision pipelines. Despite recent progress in deep learning, the existing approaches to visual information processing—especially CNN-based detectors, Transformer-based semantic models, and learning-enhanced SLAM modules—remain fragmented and insufficiently adapted to the strict Size, Weight and Power (SWaP) constraints of embedded platforms such as the NVIDIA Jetson series. This motivates a comprehensive analysis of modern neural architectures suitable for real-time, high-velocity UAV operations.

Purpose. The purpose of this study is to analyze state-of-the-art neural network methods for secondary visual processing in UAV navigation systems, compare the applicability of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), evaluate their integration into SLAM pipelines, and determine the requirements for hybrid architectures capable of supporting fully autonomous, high-speed flight.

Methods. The research employs a comparative analysis of recent deep-learning approaches, including CNN-based detectors (YOLO family), Transformer-based visual models, deep-learning–enhanced SLAM components, and Deep Reinforcement Learning (DRL) control policies. Evaluation criteria include latency, semantic robustness, dynamic-scene handling, edge-hardware compatibility, quantization performance, pruning potential, and TensorRT optimization efficiency on NVIDIA Jetson devices.

Results. The study establishes that CNNs provide superior real-time performance and remain indispensable for high-frequency reflexive perception, while Vision Transformers offer stronger global context reasoning and robustness to occlusion but suffer from significant computational overhead on embedded GPUs. Deep-learning-based SLAM methods improve feature stability and dynamic-object rejection but require careful integration to maintain real-time constraints. Hardware analysis reveals that quantization, pruning, and TensorRT acceleration are critical for deploying deep models on Jetson-class platforms, although ViTs exhibit limited INT8 quantization tolerance. Based on these findings, the work formulates a conceptual hybrid architecture that combines CNN-driven reflexive processing with Transformer-driven cognitive reasoning.

Conclusions. The results confirm the necessity of developing hybrid neuro-architectures that integrate the speed and hardware efficiency of CNNs with the semantic depth of Transformer-based models. Such architectures represent a promising pathway toward reliable, fully autonomous high-speed UAV navigation. The proposed design principles emphasize hierarchical control, asynchronous perception loops, and hardware-aware optimization as key enablers for next-generation aerial robotic systems.

Downloads

Download data is not yet available.

Author Biographies

Antonii Lupandin, V. N. Karazin Kharkiv National University, 4 Svobody Square, Kharkiv, Ukraine, 61022

PhD Student, Department of Computer Systems and Robotics

Olha Moroz, V. N. Karazin Kharkiv National University, 4 Svobody Square, Kharkiv, Ukraine, 61022

PhD in Computer Science; Associate Professor, Department of Computer Systems and Robotics

References

/

References

Sheng, Y., Liu, H., Li, J., & Han, Q. (2024). UAV autonomous navigation based on deep reinforcement learning in highly dynamic and high-density environments. Drones, 8(9), 516. https://doi.org/10.3390/drones8090516

Scherbinin, V. V., Khusainov, N. S., & Kravchenko, P. P. (2014). Combined correlation-extremal navigation system to identify AV location by terrain relief and landscape objects with the use of the stereo photogrammetry method. Middle-East Journal of Scientific Research, 19(4), 479–486. https://doi.org/10.5829/idosi.mejsr.2014.19.4.13693

Mukhina, M. P., & Seden, I. V. (2014). Analysis of modern correlation extreme navigation systems. Electronics and Control Systems, 1(39), 95–101. https://doi.org/10.18372/1990-5548.39.7343

Sotnikov, A., Tiurina, V., Petrov, K., Lukyanova, V., Lanovyy, O., Onishchenko, Y., Gnusov, Y., Petrov, S., Boichenko, O., & Breus, P. (2024). Using the set of informative features of a binding object to construct a decision function by the system of technical vision when localizing mobile robots. Eastern-European Journal of Enterprise Technologies, 3(9(129)), 60–69. https://doi.org/10.15587/1729-4061.2024.303989

Seeed Studio. (2023, March 30). YOLOv8 performance benchmarks on NVIDIA Jetson devices. Seeed Studio Blog. https://www.seeedstudio.com/blog/2023/03/30/yolov8-performance-benchmarks-on-nvidia-jetson-devices/

D. Du et al. (2019). VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW 2019) (pp. 213-226). IEEE. https://doi.org/10.1109/ICCVW.2019.00030

Zhang, J. (2023). Towards a high-performance object detector: Insights from drone detection using ViT and CNN-based deep learning models. In Proceedings of the 2023 IEEE International Conference on Sensors, Electronics and Computer Engineering (ICSECE) (pp. 141–147). IEEE. https://doi.org/10.1109/ICSECE58870.2023.10263514

Liu, T., Wang, Y., Yang, C., Zhang, Y., & Zhang, W. (2025). A lightweight hybrid CNN-ViT network for weed recognition in paddy fields. Mathematics, 13(17), 2899. https://doi.org/10.3390/math13172899

Shen, S., Yu, G., Zhang, L., Yan, Y., & Zhai, Z. (2025). LandNet: Combine CNN and Transformer to Learn Absolute Camera Pose for the Fixed-Wing Aircraft Approach and Landing. Remote Sensing, 17(4), 653. https://doi.org/10.3390/rs17040653

Xue, H., Tang, Z., Xia, Y., Wang, L., & Li, L. (2025). HCTD: A CNN-transformer hybrid for precise object detection in UAV aerial imagery. Computer Vision and Image Understanding, 259, 104409. https://doi.org/10.1016/j.cviu.2025.104409

Favorskaya, M. N. (2023). Deep learning for visual SLAM: The state-of-the-art and future trends. Electronics, 12(9), 2006. https://doi.org/10.3390/electronics12092006

Luo, L., Peng, F., & Dong, L. (2024). Improved multi-sensor fusion dynamic odometry based on neural networks. Sensors, 24(19), 6193. https://doi.org/10.3390/s24196193

Zhu, P., Wen, L., Du, D., Bian, X., Hu, Q., Ling, H., & et al. (2022). Detection and Tracking Meet Drones Challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 7380–7399. https://doi.org/10.1109/TPAMI.2021.3119563

Mohiuddin, M.B., Boiko, I., Tran, V.P. et al. Reinforcement learning for end-to-end UAV slung-load navigation and obstacle avoidance. Sci Rep 15, 34621 (2025). https://doi.org/10.1038/s41598-025-18220-6

Meimetis, D., Daramouskas, I., Patrinopoulou, N., Lappas, V., & Kostopoulos, V. (2025). Comparative analysis of object detection models for edge devices in UAV swarms. Machines, 13(8), 684. https://doi.org/10.3390/machines13080684