Understanding RAG: Retrieval-Augmented Generation

What is RAG?

Retrieval-Augmented Generation (RAG) is a pattern that connects an LLM to external knowledge bases.

Instead of relying solely on the vast but static training data baked into the model weights, RAG actively retrieves relevant documents from a database (often a Vector Database) using the user's query before passing that context to the LLM.

Key Components

Embedding Model: Converts text into dense vector representations.
Vector Database: Specializes in storing vectors and performing similarity searches (e.g., Pinecone, Milvus).
LLM: The brain that synthesizes the retrieved context to answer the user.

Why not just fine-tune?

Fine-tuning teaches a model a new skill or style, but it is terrible at facts. RAG is much better for injecting real-time, domain-specific factual knowledge without retraining the model.

Understanding RAG: Retrieval-Augmented Generation

What is RAG?

Key Components

Why not just fine-tune?

Recommended Reading

使用 Python 构建智能体(Agent)的终极指南