Adaptive context management in RAG systems for personalized AI assistants
Abstract
Relevance. The development of artificial intelligence systems based on large language models (LLMs) highlights the problem of effective dialogue context management, as conventional history storage mechanisms often lead to context overload and a reduction in response generation quality. This problem is particularly acute in Retrieval-Augmented Generation (RAG) systems, where dialogue memory is combined with dynamic retrieval of external knowledge, creating an additional burden on the model's limited context window. Existing approaches to context management do not provide an adaptive mechanism for dialogue context formation that accounts for individual user characteristics and domain specificity. Goal. Development and testing of an Adaptive Context Management System (ACMS) for personalized RAG assistants, which combines a sliding window of recent messages, compressed summaries of long-term history, and personalized knowledge retrieval from the database. Research methods. A microservice architecture has been developed, including an AI Orchestrator for coordinating the RAG process, a vector search service based on PostgreSQL with pgvector extension, and a central ACMS component for context management. The proposed approach synthesizes three strategies: sliding window to preserve the last N messages, LLM-based compression of old history fragments into thematic summaries, and a personalization layer for weighting relevance based on user vector profiles. Final context formation is performed through adaptive mixing of dialogue history and relevant knowledge from the database, taking into account individual user profiles. Results. The experimental evaluation demonstrated significant advantages of the adaptive system compared to the baseline approach. In pairwise comparisons, the adaptive system proved superior in 62% of cases (Answer Win-Rate = 0.62). The key factor for improvements was the personalization layer, which reduces repetitions and off-topic content from dialogue history, provides targeted amplification of relevant documents, and enables flexible regulation of the balance between history and knowledge. Conclusions. The developed adaptive context management system provides effective dialogue context management in RAG systems for personalized AI assistants. The integration of compression strategies, adaptive window, and user personalization enabled a 14% increase in response relevance and a 22% optimization of context volume. Experimental validation confirmed the practical feasibility of the proposed approach across different subject domains, as well as system scalability when working with large volumes of historical data.
Downloads
References
/References
P. Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," in Proc. NeurIPS, 2020. arXiv:2005.11401.
U. Khandelwal et al., "Generalization through Memorization: Nearest Neighbor Language Models," in Proc. ICLR, 2020. arXiv:1911.00172.
N. Liu et al., "Lost in the Middle: How Language Models Use Long Contexts," Trans. Assoc. Comput. Linguist., vol. 11, 2023. arXiv:2307.03172.
F. Xu et al., "Recomp: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation," in Proc. ICLR, 2023. arXiv:2310.04408.
S. Zhang et al., "Personalized Dense Retrieval on Long-Term Dialogue History," in Proc. ACL, 2023.
P. Mazaré et al., "Training Millions of Personalized Dialogue Agents," in Proc. EMNLP, 2018. arXiv:1809.01984.
L. Zhong et al., "MemGPT: Towards LLMs as Operating Systems," arXiv:2310.08560, 2024.
"Memory Management," LangChain Documentation. [Online]. Available: https://docs.langchain.com/docs/modules/memory/. [Accessed: Nov. 18, 2025].
J. Liu, "LlamaIndex: A Data Framework for LLM Applications." [Online]. Available: https://github.com/jerryjliu/llama_index. [Accessed: Nov. 18, 2025].
S. Borgeaud et al., "Improving language models by retrieving from trillions of tokens," in Proc. ICML, 2022. arXiv:2112.04426.
K. Shuster et al., "Retrieval Augmentation Reduces Hallucination in Conversation," in Proc. EMNLP, 2021. arXiv:2104.07567.
A. Asai et al., "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection," arXiv:2310.11511, 2023.
P. Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," in Proc. NeurIPS, 2020. arXiv:2005.11401.
U. Khandelwal et al., "Generalization through Memorization: Nearest Neighbor Language Models," in Proc. ICLR, 2020. arXiv:1911.00172.
N. Liu et al., "Lost in the Middle: How Language Models Use Long Contexts," Trans. Assoc. Comput. Linguist., vol. 11, 2023. arXiv:2307.03172.
F. Xu et al., "Recomp: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation," in Proc. ICLR, 2023. arXiv:2310.04408.
S. Zhang et al., "Personalized Dense Retrieval on Long-Term Dialogue History," in Proc. ACL, 2023.
P. Mazaré et al., "Training Millions of Personalized Dialogue Agents," in Proc. EMNLP, 2018. arXiv:1809.01984.
L. Zhong et al., "MemGPT: Towards LLMs as Operating Systems," arXiv:2310.08560, 2024.
"Memory Management," LangChain Documentation. [Online]. Available: https://docs.langchain.com/docs/modules/memory/. [Accessed: Nov. 18, 2025].
J. Liu, "LlamaIndex: A Data Framework for LLM Applications." [Online]. Available: https://github.com/jerryjliu/llama_index. [Accessed: Nov. 18, 2025].
S. Borgeaud et al., "Improving language models by retrieving from trillions of tokens," in Proc. ICML, 2022. arXiv:2112.04426.
K. Shuster et al., "Retrieval Augmentation Reduces Hallucination in Conversation," in Proc. EMNLP, 2021. arXiv:2104.07567.
A. Asai et al., "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection," arXiv:2310.11511, 2023.