Skip to content
Concept-Lab
LangChain⛓️ 8 / 29
LangChain

Chat Models — Passing Chat History

How LLMs simulate memory — passing the full conversation list each call.

Core Theory

LLMs are stateless APIs — each call is completely independent. The model has no memory of previous exchanges. To create a conversational experience, you must explicitly pass the entire conversation history in every call.

Pattern: Maintain a chat_history list. After each turn, append the user's HumanMessage and the AI's AIMessage. On the next call, prepend history to the messages list.

chat_history = []\n\ndef chat(user_input):\n    messages = [SystemMessage('You are a helpful assistant.')] + chat_history + [HumanMessage(user_input)]\n    response = model.invoke(messages)\n    chat_history.append(HumanMessage(user_input))\n    chat_history.append(AIMessage(response.content))\n    return response.content

This is why LLM conversation UIs like ChatGPT send the full history on every request — they're building this list and passing it each time. The context window limit is the practical ceiling: long enough conversations hit the token limit and older messages must be truncated or summarised.

Interview-Ready Deepening

Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.

  • How LLMs simulate memory — passing the full conversation list each call.
  • LLMs are stateless APIs — each call is completely independent.
  • On the next call, prepend history to the messages list.
  • The context window limit is the practical ceiling: long enough conversations hit the token limit and older messages must be truncated or summarised.
  • More expressive models improve fit but can reduce interpretability and raise overfitting risk.
  • The model has no memory of previous exchanges.
  • After each turn, append the user's HumanMessage and the AI's AIMessage .
  • Higher optimization speed can reduce training time but may increase instability if learning dynamics are not monitored.

Tradeoffs You Should Be Able to Explain

  • More expressive models improve fit but can reduce interpretability and raise overfitting risk.
  • Higher optimization speed can reduce training time but may increase instability if learning dynamics are not monitored.
  • Feature-rich pipelines improve performance ceilings but increase maintenance and monitoring complexity.

First-time learner note: Build deterministic baseline chains first (prompt -> model -> parser), then add retrieval, memory, or tools only when the baseline is stable.

Production note: Keep contracts explicit at each boundary: input variables, output schema, retries, and logs. This is what keeps orchestration reliable at scale.

What actually creates conversational memory: the model does not keep a hidden chat session between API calls. The application reconstructs continuity by sending a structured list of SystemMessage, HumanMessage, and AIMessage objects every time. The system message sets the durable behavioral frame, the human message carries the new request, and earlier AI and human turns are replayed to preserve context. That is why role labels matter. If a prior assistant answer is accidentally replayed as a user message, the model is being given the wrong conversation history.

Architecture reading: system instruction -> prior turns -> new user turn -> model invocation -> assistant reply -> append new turn pair -> next request. This is a state loop implemented by the application layer, not the model provider. Once you understand that loop, products like terminal chats, customer-support bots, and multi-step copilots become easier to reason about because they all differ mainly in how they store, trim, and validate the message list.

Failure modes and production design: chat history grows linearly with conversation length, so every serious system eventually needs truncation, summarization, or retrieval-backed memory. Duplicate appends create repeated context; missing appends make the assistant appear forgetful; missing system prompts make behavior drift. A practical design rule is to treat message construction as a first-class piece of backend logic with tests around role ordering, session isolation, and context-window budgets.

🧾 Comprehensive Coverage

Exhaustive coverage points to ensure complete topic understanding without missing core concepts.

Loading interactive module...

💡 Concrete Example

Multi-turn memory simulation: 1) Turn 1 question and answer are appended to history. 2) Turn 2 includes full prior history plus new question. 3) Turn 3 follow-up uses context from turns 1 and 2. 4) If history is omitted, follow-up quality drops immediately. LLMs are stateless; history passing creates conversational continuity.

🧠 Beginner-Friendly Examples

Guided Starter Example

Multi-turn memory simulation: 1) Turn 1 question and answer are appended to history. 2) Turn 2 includes full prior history plus new question. 3) Turn 3 follow-up uses context from turns 1 and 2. 4) If history is omitted, follow-up quality drops immediately. LLMs are stateless; history passing creates conversational continuity.

Source-grounded Practical Scenario

How LLMs simulate memory — passing the full conversation list each call.

Source-grounded Practical Scenario

LLMs are stateless APIs — each call is completely independent.

🧭 Architecture Flow

Loading interactive module...

🎬 Interactive Visualization

Loading interactive module...

🛠 Interactive Tool

Loading interactive module...

🧪 Interactive Sessions

  1. Concept Drill: Manipulate key parameters and observe behavior shifts for Chat Models — Passing Chat History.
  2. Failure Mode Lab: Trigger an edge case and explain remediation decisions.
  3. Architecture Reorder Exercise: Reorder 5 flow steps into the correct production sequence.

💻 Code Walkthrough

Conversation handling baseline for carrying prior messages through turns.

content/github_code/langchain-course/1_chat_models/2_chat_models_conversation.py

Message-list based multi-turn conversation flow.

Open highlighted code →
  1. Track how history is appended and reused per turn.

🎯 Interview Prep

Questions an interviewer is likely to ask about this topic. Think through your answer before reading the senior angle.

  • Q1[beginner] Why do LLMs not remember previous messages without explicit history passing?
    The causal reason is that system behavior is constrained by data, model contracts, and runtime context, not just algorithm choice. LLMs are stateless APIs — each call is completely independent.. A practical check is to validate impact on quality, latency, and failure recovery before scaling. If ignored, teams usually hit parser breaks, prompt-tool mismatch, and fragile chain coupling; prevention requires typed I/O boundaries, retries with fallback paths, and trace-level observability.
  • Q2[intermediate] What is the structure of messages passed when you include chat history?
    It is best defined by the role it plays in the end-to-end system, not in isolation. LLMs are stateless APIs — each call is completely independent.. Operationally, its value appears only when integrated with LCEL composition, prompt contracts, structured output parsing, and tool schemas and measured against real outcomes. Multi-turn memory with InMemoryChatMessageHistory: history stores all previous HumanMessages and AIMessages. On turn 3, when the user asks 'Can you elaborate?', the model receives turns 1+2 as context and knows what to elaborate on — without you manually managing the list.. A common pitfall is parser breaks, prompt-tool mismatch, and fragile chain coupling; mitigate with typed I/O boundaries, retries with fallback paths, and trace-level observability.
  • Q3[expert] What happens when a conversation grows longer than the model's context window?
    It is best defined by the role it plays in the end-to-end system, not in isolation. LLMs are stateless APIs — each call is completely independent.. Operationally, its value appears only when integrated with LCEL composition, prompt contracts, structured output parsing, and tool schemas and measured against real outcomes. Multi-turn memory with InMemoryChatMessageHistory: history stores all previous HumanMessages and AIMessages. On turn 3, when the user asks 'Can you elaborate?', the model receives turns 1+2 as context and knows what to elaborate on — without you manually managing the list.. A common pitfall is parser breaks, prompt-tool mismatch, and fragile chain coupling; mitigate with typed I/O boundaries, retries with fallback paths, and trace-level observability.
  • Q4[expert] How would you explain this in a production interview with tradeoffs?
    In production, in-memory chat_history is insufficient — it resets on every server restart or when the serverless function spins down. Production systems store history in a database (Redis for speed, PostgreSQL for persistence) keyed by session_id. LangChain provides <code>RedisChatMessageHistory</code> and similar out-of-the-box integrations. This connects directly to the next lesson on cloud-persisted history.
🏆 Senior answer angle — click to reveal
Use the tier progression: beginner correctness -> intermediate tradeoffs -> expert production constraints and incident readiness.

📚 Revision Flash Cards

Test yourself before moving on. Flip each card to check your understanding — great for quick revision before an interview.

Loading interactive module...