Assessment of the impact of photorealistic textures on the accuracy of computer vision models using synthetic datasets
Abstract
Relevance. The current development of computer vision faces the problem of high cost and labor intensity of collecting real annotated data. The use of synthetic data generated in graphics engines is an effective alternative, but the main obstacle remains the “domain gap,” which reduces the accuracy of models on real images.
The goal of this work is to quantitatively assess the impact of the photorealistic texture of the target object on the detection efficiency of YOLO models when transitioning from simulation to reality (Sim2Real).
The research methodology is based on a controlled experiment in the Unity environment, where two identical synthetic datasets were generated, differing only in the type of 3D model texture: highly detailed photorealistic (“Textured”) and monochrome white (“White”). The models were trained based on the YOLOv11s architecture using a transfer learning strategy and a two-step fine-tuning process. The results were validated on an independent set of exclusively real photographs.
Results. Both models, trained on two datasets (“Textured” and “White”), achieved almost identical accuracy on synthetic validation data (mAP@0.5 ≈ 0.995). However, on real photos, the “Textured” model demonstrated 11.6 times higher mAP@0.5 compared to the “White” model. The recall for the textured model was 10.3 times higher than for the model that relied solely on geometric shape.
Conclusions. Photorealistic texture is a critical factor for successful Sim2Real transfer. It ensures the formation of universal low-level features in the early layers of the neural network, which are necessary for recognizing objects in a real environment. High-quality texturing of 3D assets should be considered a strategic priority rather than an auxiliary stage of visualization.
Downloads
References
Man, K.; Chahl, J. A Review of Synthetic Image Data and Its Use in Computer Vision. J. Imaging 2022, 8, 310.
Mumuni, A.; Mumuni, F. A Survey of Synthetic Data Augmentation Methods in Computer Vision. arXiv preprint arXiv:2403.10075, 2024.
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. In 2017 IEEE/RSJ 2. International Conference on Intelligent Robots and Systems (IROS) (pp. 23-30).
Jackson, D., Gokhale, V., & Wyatt, J. L. (2019). Quantifying the Use of Domain Randomization for Object Localization. arXiv preprint arXiv:1910.03438.
Csurka, G. (2017). Domain Adaptation for Visual Applications: A Comprehensive Survey. arXiv preprint arXiv:1702.05374.
Wang, M., & Deng, W. (2018). Deep Visual Domain Adaptation: A Survey. Neurocomputing, 312, 135-153.
Hinterstoisser, S., Pauly, O., Heibel, H., Marek, M., & Bokeloh, M. (2019). An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Instance Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 9779-9789).
Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks?. Advances in neural information processing systems, 27.
Borkman, S., et al. (2021). Unity Perception: Generate Synthetic Data for Computer Vision. arXiv preprint arXiv:2107.04259.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 779-788).
Koirala, A., et al. (2021). Crossing the Reality Gap: A Survey on Sim-to-Real Transferability of Robot Controllers in Reinforcement Learning. Journal of Intelligent & Robotic Systems, 103(4), 67.
Truong, J., Chernova, S., & Batra, D. (2021). Bi-directional Domain Adaptation for Sim2Real Transfer of Embodied Navigation Agents. IEEE Robotics and Automation Letters (RA-L), 6(2), 2634–2641.
Kadian, A., Chhabra, T., Gupta, K., & Kumar, S. (2023). A Survey of Sim-to-Real Methods in RL: Progress, Prospects, and Challenges with Foundation Models. arXiv preprint arXiv:2302.09337.
Hashemifar, S., et al. (2024). Recent Advances in Deep Learning for Protein-Protein Interaction: A Review. International Journal of Molecular Sciences, 25(11), 5949.
Awais, M., et al. (2023). Don't freeze: Finetune encoders for better Self-Supervised HAR. In Proceedings of the 2023 ACM International Symposium on Wearable Computers.
Finlayson, G. D., et al. (2023). Impact of Exposure and Illumination on Texture Classification Based on Raw Spectral Filter Array Images. Sensors, 23(12), 5649.
Chung, E., et al. (2023). Inclusive Portrait Lighting Estimation Model Leveraging Graphic-Based Synthetic Data. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.
Nikolenko, S. I. (2021). Synthetic Data for Deep Learning. Springer Nature.
Picard, R. W. (2021). The Reproducibility Crisis in ML/AI: An Overview. IEEE Open Journal of Signal Processing, 2, 407–414.
Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, J. (2020). Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 12993-13000.
Loshchilov, I., & Hutter, F. (2016). SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv preprint arXiv:1608.03983.
Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., ... & He, K. (2017). Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv preprint arXiv:1706.02677.
Uzlov, D., Strukov, V., Hudilin, V., & Vlasov, O. (2023). Problematic issues of machine learning technology in law enforcement. Computer Science and Cybersecurity, 2, 6-15. URL:https://doi.org/10.26565/2519-2310-2023-2-01