Gen-AI Developer Classroom notes 18/Jan/2026

RAG – Retrieval-Augmented Generation

Simple truth

LLMs do not know your company data

An LLM knows what it was trained on (internet + books till cutoff date)
when we ask:

what is our company's leave policy

Then LLM will either
- Guess (Hallucinate)
- Or say I don’t know
This is unacceptable in enterprises
Enter RAG – Retrieval-Augmented Generation

RAG = search + LLM

Rather than asking the LLM to remember everything we retrieve the right information first and ask the LLM to generate an answer from it.
On a larger note we have 3 stages
- Retrieve relevant content
- Augment the prompt with that content
- Generate the final answer
Example RAGS
- Education:
  - Chat with text books
  - Ask doubts from PDFs
- Enterprises:
  - HR Policy assistant
  - Internal wiki chatbot
  - Compliance assistant
- Operations:
  - Chat with logs
  - Codebase Q&A
  - API Documentation assitant

Components of a RAG System

Building a RAG system comprises of multiple interconnected components
- Retriever: retrieves the content from knowledge sources based on natural language query
- Generator: Synthesizes responses via LLMs
- Knowledge Sources: This is your private data
  - Vector store (Pinecone, FIASS …)
  - document DB
  - APIs
- Pipeline (Orchestrator): This is workflow

Step by Step RAG Flow

1. Document Ingestion (Offline step)

We start with documents
- PDFs
- Word files
- Markdown
- HTML
- Database records
These documents are
- split into chunks (small pieces of text)
- Each chunk is converted into a vector embedding
- stored in vector database

2. Embedding

An embedding is a numerical representation fo text meaning which allows semantic search not keyword search.

3. Vector Database (Semantic search engine)

Stores the embeddings
Given a question -> Finds top-K most relevant chunks

4. Query Time

When user asks
The system
- converts the question into an embedding
- searches the vector DB
- retrieves the relevant chunks
- Injects them into LLM Prompt

5. Augmented Prompt

Instead of asking user: how many leaves can i take per year we ask

system: 
Answer ONLY using the provided contenxt

Context:
- Employees are entitled to 24 paid leaves per year
- Unused leaves can be carried forward up to 12 days

Question:
How many leaves can I take per year?

Now the LLM is
- Grounded
- Accurate
- Auditable

6. Generation

LLM now
- Reads the retrieved content
- Generates a human-friendly answer
- Does not invent facts (hallicunate)

Gen-AI Developer Classroom notes 18/Jan/2026

RAG – Retrieval-Augmented Generation

Components of a RAG System

Step by Step RAG Flow

1. Document Ingestion (Offline step)

2. Embedding

3. Vector Database (Semantic search engine)

4. Query Time

5. Augmented Prompt

6. Generation

Like this:

By continuous learner

Leave a ReplyCancel reply

RAG – Retrieval-Augmented Generation

Components of a RAG System

Step by Step RAG Flow

1. Document Ingestion (Offline step)

2. Embedding

3. Vector Database (Semantic search engine)

4. Query Time

5. Augmented Prompt

6. Generation

Share this:

Like this:

By continuous learner

Leave a ReplyCancel reply

Discover more from Direct AI Powered By Quality Thought