RAG – Retrieval-Augmented Generation
- Simple truth
LLMs do not know your company data
- An LLM knows what it was trained on (internet + books till cutoff date)
- when we ask:
what is our company's leave policy
- Then LLM will either
- Guess (Hallucinate)
- Or say I don’t know
- This is unacceptable in enterprises
- Enter RAG – Retrieval-Augmented Generation
RAG = search + LLM
- Rather than asking the LLM to remember everything we retrieve the right information first and ask the LLM to generate an answer from it.
-
On a larger note we have 3 stages
- Retrieve relevant content
- Augment the prompt with that content
- Generate the final answer
-
Example RAGS
- Education:
- Chat with text books
- Ask doubts from PDFs
- Enterprises:
- HR Policy assistant
- Internal wiki chatbot
- Compliance assistant
- Operations:
- Chat with logs
- Codebase Q&A
- API Documentation assitant
- Education:
Components of a RAG System
- Building a RAG system comprises of multiple interconnected components
- Retriever: retrieves the content from knowledge sources based on natural language query
- Generator: Synthesizes responses via LLMs
- Knowledge Sources: This is your private data
- Vector store (Pinecone, FIASS …)
- document DB
- APIs
- Pipeline (Orchestrator): This is workflow
Step by Step RAG Flow
1. Document Ingestion (Offline step)
- We start with documents
- PDFs
- Word files
- Markdown
- HTML
- Database records
- These documents are
- split into chunks (small pieces of text)
- Each chunk is converted into a vector embedding
- stored in vector database
2. Embedding
- An embedding is a numerical representation fo text meaning which allows semantic search not keyword search.
3. Vector Database (Semantic search engine)
- Stores the embeddings
- Given a question -> Finds top-K most relevant chunks
4. Query Time
- When user asks
- The system
- converts the question into an embedding
- searches the vector DB
- retrieves the relevant chunks
- Injects them into LLM Prompt
5. Augmented Prompt
- Instead of asking
user: how many leaves can i take per yearwe ask
system:
Answer ONLY using the provided contenxt
Context:
- Employees are entitled to 24 paid leaves per year
- Unused leaves can be carried forward up to 12 days
Question:
How many leaves can I take per year?
- Now the LLM is
- Grounded
- Accurate
- Auditable
6. Generation
- LLM now
- Reads the retrieved content
- Generates a human-friendly answer
- Does not invent facts (hallicunate)
