Reflective memory architecture for adaptive planning in hierarchical LLM agents in virtual environments
Abstract
Relevance: Large language models (LLMs) can be used as one of the components of autonomous agents that solve sequential decision-making tasks. To improve agent performance, it is necessary to store the history of previous observations and actions, which leads to filling the LLM context window, increasing computational costs, prolonging planning time, and raising memory requirements. A possible approach to addressing this problem is the application of observation reflection methods using LLMs.
Goal: To study the impact of memory reflection methods for autonomous agents based on LLMs. To compare these methods with simpler memory organization approaches.
Research methods: Computational experiments and comparative analysis. Memory organization methods: full episode history, reflection, and reflection with a structured set of memories. The agent performance metrics: task success rate, cumulative reward per episode, and the number of steps required to complete the task.
Results: A memory summarization method based on reflection is proposed for a hierarchical LLM-based agent. The Minigrid ColoredDoorKey environment is used for agent training. Agent code is developed, including components for training the agent in the environment. Computational experiments are conducted to train and evaluate the agent with different memory mechanisms. The performance of different memory mechanisms is evaluated using the following metrics: task completion accuracy, cumulative reward, and the number of steps until episode termination. An analysis and comparison of the results of applying different memory mechanisms to the agent’s action planning task in the ColoredDoorKey environment are performed.
Conclusions: The study demonstrates that the use of reflection with a structured set of memories is appropriate for action planning tasks in autonomous agents based on LLMs. The reflection method enables the agent to generalize experience, identify effective rules within large volumes of data with sparse reward signals, and achieve a level of performance comparable to that of a human expert.
Downloads
References
/References
Zhang Z., Dai Q., Bo X. et. al. A survey on the memory mechanism of large language model-based agents. ACM Transactions on Information Systems. 2025. Vol. 43. P. С. 1—47.
Park J., O’Brien J., Cai C. et. al. «Generative agents: Interactive simulacra of human behavior». In: Proceedings of the 36th annual acm symposium on user interface software and technology. 2023, с. 1-22.
Zhu X., Chen Y., Tian H. et. al. Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory. 2023. arXiv: 2305.17144 [cs.AI]. URL: https://arxiv.org/abs/2305.17144.
Zhao A., Huang D., Xu Q. et. al. «Expel: Llm agents are experiential learners». In: Proceedings of the AAAI Conference on Artificial Intelligence. Т. 38. 17. 2024, с. 19632—19642.
Zhong W., Guo L., Gao Q. et. al. «Memorybank: Enhancing large language models with long-term memory». In: Proceedings of the AAAI Conference on Artificial Intelligence. Т. 38. 17. 2024, с. 19724-19731.
Shinn N., Cassano F., Berman E. et. al. Reflexion: Language Agents with Verbal Reinforcement Learning. 2023. arXiv: 2303 . 11366 [cs.AI]. URL: https://arxiv.org/abs/2303.11366
Madaan A., Tandon N., Gupta P. та ін. Self-Refine: Iterative Refinement with Self-Feedback. 2023. arXiv: 2303.17651 [cs.CL]. URL: https://arxiv.org/abs/2303.17651 .
Zhang W., Tang K., Wu H. та ін. Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization. 2024. arXiv: 2402.17574 [cs.AI]. URL: https://arxiv.org/abs/2402.17574 .
Packer C., Wooders S., Lin K. та ін. MemGPT: Towards LLMs as Operating Systems. 2024. arXiv: 2310.08560 [cs.AI]. URL: https://arxiv.org/abs/2310.08560 .
Xu W., Liang Z., Mei K. та ін. A-MEM: Agentic Memory for LLM Agents. 2025. arXiv: 2502.12110 [cs.CL]. URL: https://arxiv.org/abs/2502.12110.
Zhang Z., Dai Q., Bo X. et. al. A survey on the memory mechanism of large language model-based agents. ACM Transactions on Information Systems. 2025. Vol. 43. P. С. 1—47.
Park J., O’Brien J., Cai C. et. al. «Generative agents: Interactive simulacra of human behavior». In: Proceedings of the 36th annual acm symposium on user interface software and technology. 2023, с. 1-22.
Zhu X., Chen Y., Tian H. et. al. Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory. 2023. arXiv: 2305.17144 [cs.AI]. URL: https://arxiv.org/abs/2305.17144.
Zhao A., Huang D., Xu Q. et. al. «Expel: Llm agents are experiential learners». In: Proceedings of the AAAI Conference on Artificial Intelligence. Т. 38. 17. 2024, с. 19632—19642.
Zhong W., Guo L., Gao Q. et. al. «Memorybank: Enhancing large language models with long-term memory». In: Proceedings of the AAAI Conference on Artificial Intelligence. Т. 38. 17. 2024, с. 19724-19731.
Shinn N., Cassano F., Berman E. et. al. Reflexion: Language Agents with Verbal Reinforcement Learning. 2023. arXiv: 2303 . 11366 [cs.AI]. URL: https://arxiv.org/abs/2303.11366
Madaan A., Tandon N., Gupta P. та ін. Self-Refine: Iterative Refinement with Self-Feedback. 2023. arXiv: 2303.17651 [cs.CL]. URL: https://arxiv.org/abs/2303.17651 .
Zhang W., Tang K., Wu H. та ін. Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization. 2024. arXiv: 2402.17574 [cs.AI]. URL: https://arxiv.org/abs/2402.17574 .
Packer C., Wooders S., Lin K. та ін. MemGPT: Towards LLMs as Operating Systems. 2024. arXiv: 2310.08560 [cs.AI]. URL: https://arxiv.org/abs/2310.08560 .
Xu W., Liang Z., Mei K. та ін. A-MEM: Agentic Memory for LLM Agents. 2025. arXiv: 2502.12110 [cs.CL]. URL: https://arxiv.org/abs/2502.12110.