Workflow Part 2 is query-time orchestration. This is where retrieval output and generation behavior combine into user-visible quality.
Query-time stages:
- Receive user query and optional conversation context.
- Retrieve relevant chunks with configured retriever.
- Assemble context window for generation prompt.
- Generate grounded answer with citation discipline.
- Apply post-generation checks (confidence, citation presence, policy constraints).
Critical handoff problem: many systems retrieve good chunks but lose grounding because prompt does not explicitly require evidence-based answering. Prompt contract must force “answer from provided context; abstain when insufficient.”
Production safeguards:
- Context truncation policy to stay within token budget.
- Fallback when retrieval confidence is low.
- Structured response schema including confidence and sources.
Interview-Ready Deepening
Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.
- Critical handoff problem: many systems retrieve good chunks but lose grounding because prompt does not explicitly require evidence-based answering.
- Apply post-generation checks (confidence, citation presence, policy constraints).
- This is where retrieval output and generation behavior combine into user-visible quality.
- Prompt contract must force “answer from provided context; abstain when insufficient.”
- Higher recall often increases context noise; reranking and filtering are required to keep precision high.
- Smaller chunks improve semantic precision but can break cross-sentence context needed for accurate answers.
- Aggressive grounding reduces hallucinations but can increase abstentions when retrieval coverage is weak.
- Prompt contract must force “answer from provided context; abstain when insufficient.” Production safeguards: Context truncation policy to stay within token budget.
Tradeoffs You Should Be Able to Explain
- Higher recall often increases context noise; reranking and filtering are required to keep precision high.
- Smaller chunks improve semantic precision but can break cross-sentence context needed for accurate answers.
- Aggressive grounding reduces hallucinations but can increase abstentions when retrieval coverage is weak.
First-time learner note: Build deterministic baseline chains first (prompt -> model -> parser), then add retrieval, memory, or tools only when the baseline is stable.
Production note: Keep contracts explicit at each boundary: input variables, output schema, retries, and logs. This is what keeps orchestration reliable at scale.