Embeddings and vector databases are the retrieval engine of RAG. Embeddings map text into high-dimensional vectors such that semantic similarity corresponds to geometric proximity. Vector DBs index these vectors for fast nearest-neighbor search.
Conceptual model:
- Each chunk -> embedding vector.
- User query -> embedding vector using same model family.
- Similarity search returns nearest chunks (top-k).
Key engineering requirement: use consistent embedding model for both document and query vectors. Mixing incompatible embedding spaces causes retrieval collapse.
Vector DB responsibilities:
- Fast ANN (approximate nearest neighbor) retrieval at scale.
- Metadata filtering (tenant, doc type, date, permission).
- Upsert/delete/version management.
Practical trade-offs: higher-dimensional embeddings can improve semantic nuance but increase storage and latency; top-k too small hurts recall, too large introduces noise.
Interview-Ready Deepening
Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.
- Embeddings and vector databases are the retrieval engine of RAG.
- Embeddings map text into high-dimensional vectors such that semantic similarity corresponds to geometric proximity.
- Key engineering requirement: use consistent embedding model for both document and query vectors.
- Practical trade-offs: higher-dimensional embeddings can improve semantic nuance but increase storage and latency; top-k too small hurts recall, too large introduces noise.
- Vector DB responsibilities: Fast ANN (approximate nearest neighbor) retrieval at scale.
- Vector DBs index these vectors for fast nearest-neighbor search.
- Higher recall often increases context noise; reranking and filtering are required to keep precision high.
- Smaller chunks improve semantic precision but can break cross-sentence context needed for accurate answers.
Tradeoffs You Should Be Able to Explain
- Higher recall often increases context noise; reranking and filtering are required to keep precision high.
- Smaller chunks improve semantic precision but can break cross-sentence context needed for accurate answers.
- Aggressive grounding reduces hallucinations but can increase abstentions when retrieval coverage is weak.
First-time learner note: Build deterministic baseline chains first (prompt -> model -> parser), then add retrieval, memory, or tools only when the baseline is stable.
Production note: Keep contracts explicit at each boundary: input variables, output schema, retries, and logs. This is what keeps orchestration reliable at scale.