This example is about building an interactive conversation loop, not just making one isolated model call. The program keeps asking the user for input, appends that input to a history list, sends the full history to the chat model, appends the AI response, and repeats. That repeated loop is what makes the application feel like ChatGPT running locally in your terminal.
Core flow:
- Initialize a
chat_history list.
- Seed it with a
SystemMessage so the assistant has a stable role.
- Inside a loop, read user input from the terminal.
- If the user types
exit, break the loop.
- Append the current input as a
HumanMessage.
- Invoke the model with the full message history.
- Append the returned content as an
AIMessage.
Why follow-up questions work: because every new model call receives the previous conversation turns again. If the user asks, Do the same for 81 after asking about the square root of 49, the model can infer that the new question refers to square roots only because the prior turns are still inside chat_history.
Practical limitation: this design keeps state only in RAM. It is excellent for understanding how chat products work, but once the process stops the conversation vanishes. That leads directly to the next architecture pattern: durable cloud-backed history.
Interview-Ready Deepening
Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.
- Building a local multi-turn chat loop that keeps history in memory and answers follow-up questions correctly.
- If the user asks, Do the same for 81 after asking about the square root of 49, the model can infer that the new question refers to square roots only because the prior turns are still inside chat_history .
- The program keeps asking the user for input, appends that input to a history list, sends the full history to the chat model, appends the AI response, and repeats.
- Why follow-up questions work: because every new model call receives the previous conversation turns again.
- This example is about building an interactive conversation loop, not just making one isolated model call.
- It is excellent for understanding how chat products work, but once the process stops the conversation vanishes.
- That leads directly to the next architecture pattern: durable cloud-backed history.
- Inside a loop, read user input from the terminal.
Tradeoffs You Should Be Able to Explain
- More expressive models improve fit but can reduce interpretability and raise overfitting risk.
- Higher optimization speed can reduce training time but may increase instability if learning dynamics are not monitored.
- Feature-rich pipelines improve performance ceilings but increase maintenance and monitoring complexity.
First-time learner note: Build deterministic baseline chains first (prompt -> model -> parser), then add retrieval, memory, or tools only when the baseline is stable.
Production note: Keep contracts explicit at each boundary: input variables, output schema, retries, and logs. This is what keeps orchestration reliable at scale.
This topic is really about the chat application loop. The local terminal demo behaves like a tiny chat product: prompt the user, append the new human message, invoke the model with the full history, append the returned AI message, and repeat until the user exits. The reason follow-up questions work is not because the model is streaming hidden memory forward; it is because the loop keeps replaying the updated history list each time.
System-design interpretation: input listener -> session history mutation -> model call -> output render -> persistence or in-memory update. Even though the demo is local and simple, it already contains the same control structure that real chat interfaces use. The production version adds multiple sessions, cancellation, authentication, logging, and durable storage, but the core loop is unchanged.
Edge cases: if one shared list is reused across users, conversations leak. If the process restarts, in-memory history disappears. If the exit path is not handled cleanly, users cannot terminate the loop without killing the process. The lesson is that interactive LLM apps are not just prompts; they are long-lived control flows with state transitions at every turn.