What is RAG?
Retrieval-Augmented Generation (RAG) is a pattern that connects an LLM to external knowledge bases.
Instead of relying solely on the vast but static training data baked into the model weights, RAG actively retrieves relevant documents from a database (often a Vector Database) using the user's query before passing that context to the LLM.
Key Components
- Embedding Model: Converts text into dense vector representations.
- Vector Database: Specializes in storing vectors and performing similarity searches (e.g., Pinecone, Milvus).
- LLM: The brain that synthesizes the retrieved context to answer the user.
Why not just fine-tune?
Fine-tuning teaches a model a new skill or style, but it is terrible at facts. RAG is much better for injecting real-time, domain-specific factual knowledge without retraining the model.