Conversational memory

Lotus Labs
3 min readNov 7


What is Conversational memory:

Conversational memory is something which helps chat bot to already have a context to respond to a question which requires context, without which there would be a failure in response.For example a chatbot with a context of a sports personality will be able to answer the questions related to that, but without that it might lead to void response.

Chatbots is the major use case with respect to conversation memory. Langchain is a framework through which we inject or leverage conversational memory. Langchain helps us to use LLM and their features in a very efficient and organized manner.

Types of conversation memory:

Conversation Buffer Memory:

The way ConversationBufferMemory works is quite simple. It stores the entire conversation in memory, but it has a maximum limit, like 4096 tokens for gpt-3.5-turbo or 8192 tokens for gpt-4. However, it’s important to note that it can get expensive because each request sends all the conversation data to the API, and you get charged based on the number of tokens used. This cost can accumulate quickly, especially if you’re making frequent requests. Plus, because you’re handling a large amount of text, there might be some added delays in the conversation.

Conversation Buffer Window Memory:

For many conversations, you often don’t need context from a lot of previous messages. In such cases, there’s the option of using ConversationBufferWindowMemory. This approach lets you manage the buffer by focusing on the last k messages, which acts like a window. This way, you can control how much context is included in your request, while still keeping recent conversational history handy for the AI to understand and respond effectively.


Unlike ConversationBufferMemory, ConversationSummaryMemory doesn’t store the complete conversation history in memory, and it doesn’t use a window approach either. Instead, it continuously summarizes the conversation as it unfolds, preserving context from the very beginning of the conversation. This way, it offers ongoing insights without needing to keep all the previous messages in memory.

Conversation summary buffer memory:

With ConversationSummaryBufferMemory, you get a more detailed level of control. It lets you keep the most recent messages intact while summarizing the previous ones. You can decide to push a certain number of tokens into the summary and retain the rest in the buffer. This can be quite useful when you want precise context from the latest few messages while still having a summarized version of older interactions in the conversation. It strikes a balance between detailed context and memory efficiency.

Conversation token buffer memory:

ConversationTokenBufferMemory is a straightforward approach that manages the buffer based on the token count without any summarization. It’s similar to ConversationBufferWindowMemory, but instead of flushing based on the last number of messages (k), it flushes based on the total number of tokens. This method ensures that you control the memory usage by token count while still keeping context within the conversation.



Lotus Labs

Transform your business into an AI-driven enterprise. We specialize in Machine learning for Retail, Insurance, and Healthcare industries.