Reflective memory architecture for adaptive planning in hierarchical LLM agents in virtual environments

Keywords: artificial intelligence, machine learning, deep learning, artificial neural networks, intelligent information systems, automated information systems, natural language processing, large language model, prompt, decision making, agent, memory, virtual environment, Minigrid

Abstract

Relevance: Large language models (LLMs) can be used as one of the components of autonomous agents that solve sequential decision-making tasks. To improve agent performance, it is necessary to store the history of previous observations and actions, which leads to filling the LLM context window, increasing computational costs, prolonging planning time, and raising memory requirements. A possible approach to addressing this problem is the application of observation reflection methods using LLMs.

Goal: To study the impact of memory reflection methods for autonomous agents based on LLMs. To compare these methods with simpler memory organization approaches.

Research methods: Computational experiments and comparative analysis. Memory organization methods: full episode history, reflection, and reflection with a structured set of memories. The agent performance metrics: task success rate, cumulative reward per episode, and the number of steps required to complete the task.

Results: A memory summarization method based on reflection is proposed for a hierarchical LLM-based agent. The Minigrid ColoredDoorKey environment is used for agent training. Agent code is developed, including components for training the agent in the environment. Computational experiments are conducted to train and evaluate the agent with different memory mechanisms. The performance of different memory mechanisms is evaluated using the following metrics: task completion accuracy, cumulative reward, and the number of steps until episode termination. An analysis and comparison of the results of applying different memory mechanisms to the agent’s action planning task in the ColoredDoorKey environment are performed.

Conclusions: The study demonstrates that the use of reflection with a structured set of memories is appropriate for action planning tasks in autonomous agents based on LLMs. The reflection method enables the agent to generalize experience, identify effective rules within large volumes of data with sparse reward signals, and achieve a level of performance comparable to that of a human expert.

Downloads

Download data is not yet available.

Author Biographies

Ihor Omelchenko, Karazin Kharkiv National University, Svobody Sq 4, Kharkiv, Ukraine, 61022

PhD student, Department of Mathematical Modeling and Data Analysis

Volodymyr Strukov, Karazin Kharkiv National University, Svobody Sq 4, Kharkiv, Ukraine, 61022

PhD in Technical Sciences, Associate Professor; Head of the Department of Mathematical Modeling and Data Analysis

References

/

References

Published
2025-12-22
How to Cite
Omelchenko, I., & Strukov, V. (2025). Reflective memory architecture for adaptive planning in hierarchical LLM agents in virtual environments. Bulletin of V.N. Karazin Kharkiv National University, Series «Mathematical Modeling. Information Technology. Automated Control Systems», 68, 62-69. https://doi.org/10.26565/2304-6201-2025-68-06
Section
Статті