On the impact of prompts on agent performance in a virtual environment
Abstract
Relevance: Currently, it is promising to study the application of language models in decision-making tasks. It is possible to use pre-trained language models that demonstrate skills in working with arbitrary text, solving logical tasks, and are able to learn from text examples. Such language models are able to solve new tasks that are presented in text form.
Goal: The goal is to conduct a study of the influence of various language instructions (prompts) on the functioning of an agent in a virtual environment. The agent functions on the basis of a pre-trained language model.
Research methods: To perform the study, a Minigrid virtual environment and pre-trained language models were used, a software agent was created based on the language model, a set of language instructions was created using such methods as zero-shot learning, few-shot learning, and others. The effectiveness of the agent's functioning is estimated using the following numerical values: total reward in the environment, episode duration, number of language model calls. Experiments were conducted to train and test a software agent in a virtual environment. Numerical and statistical results of experiments were collected.
Results: Differences in the functioning of the agent were revealed when using different methods of designing language instructions. Language instructions that contain examples of solving tasks lead to better results than those that present the task in imperative form. An improvement in agent performance was also demonstrated upon the addition of a default action plan. Adding episodic memory into the agent further enhanced performance in specific cases.
Conclusions: In this work, we considered a software agent based on a pre-trained language model that solves the decision-making problem in a virtual environment.
Downloads
References
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," 2023. [Online]. Available: https://arxiv.org/abs/1706.03762
L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin, W. X. Zhao, Z. Wei, and J. Wen, "A survey on large language model based autonomous agents," Frontiers of Computer Science, vol. 18, no. 6, Mar. 2024. [Online]. Available: http://dx.doi.org/10.1007/s11704-024-40231-1
R. S. Sutton, D. Precup, and S. Singh, "Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning," Artificial Intelligence, vol. 112, no. 1, pp. 181--211, 1999. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0004370299000521
B. Hu, C. Zhao, P. Zhang, Z. Zhou, Y. Yang, Z. Xu, and B. Liu, "Enabling intelligent interactions between an agent and an LLM: A reinforcement learning approach," Reinforcement Learning Journal, vol. 3, pp. 1289--1305, 2024. [Online]. Available: https://rlj.cs.umass.edu/2024/papers/RLJ_RLC_2024_161.pdf
I. Dasgupta, C. Kaeser-Chen, K. Marino, A. Ahuja, S. Babayan, F. Hill, and R. Fergus, "Collaborating with language models for embodied reasoning," 2023. [Online]. Available: https://arxiv.org/abs/2302.00763
W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y. Chebotar, P. Sermanet, N. Brown, T. Jackson, L. Luu, S. Levine, K. Hausman, and B. Ichter, "Inner monologue: Embodied reasoning through planning with language models," 2022. [Online]. Available: https://arxiv.org/abs/2207.05608
Q. Dong, L. Li, D. Dai, C. Zheng, J. Ma, R. Li, H. Xia, J. Xu, Z. Wu, T. Liu, B. Chang, X. Sun, L. Li, and Z. Sui, "A survey on in-context learning," 2024. [Online]. Available: https://arxiv.org/abs/2301.00234
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," 2017. [Online]. Available: https://arxiv.org/abs/1707.06347