Gen-AI Developer Classroom notes 18/Jan/2026

RAG – Retrieval-Augmented Generation

  • Simple truth
LLMs do not know your company data
  • An LLM knows what it was trained on (internet + books till cutoff date)
  • when we ask:
what is our company's leave policy
  • Then LLM will either
    • Guess (Hallucinate)
    • Or say I don’t know
  • This is unacceptable in enterprises
  • Enter RAG – Retrieval-Augmented Generation
RAG = search + LLM
  • Rather than asking the LLM to remember everything we retrieve the right information first and ask the LLM to generate an answer from it.
  • On a larger note we have 3 stages

    • Retrieve relevant content
    • Augment the prompt with that content
    • Generate the final answer
  • Example RAGS

    • Education:
      • Chat with text books
      • Ask doubts from PDFs
    • Enterprises:
      • HR Policy assistant
      • Internal wiki chatbot
      • Compliance assistant
    • Operations:
      • Chat with logs
      • Codebase Q&A
      • API Documentation assitant

Components of a RAG System

  • Building a RAG system comprises of multiple interconnected components
    • Retriever: retrieves the content from knowledge sources based on natural language query
    • Generator: Synthesizes responses via LLMs
    • Knowledge Sources: This is your private data
      • Vector store (Pinecone, FIASS …)
      • document DB
      • APIs
    • Pipeline (Orchestrator): This is workflow

Step by Step RAG Flow

1. Document Ingestion (Offline step)

  • We start with documents
    • PDFs
    • Word files
    • Markdown
    • HTML
    • Database records
  • These documents are
    • split into chunks (small pieces of text)
    • Each chunk is converted into a vector embedding
    • stored in vector database

2. Embedding

  • An embedding is a numerical representation fo text meaning which allows semantic search not keyword search.

3. Vector Database (Semantic search engine)

  • Stores the embeddings
  • Given a question -> Finds top-K most relevant chunks

4. Query Time

  • When user asks
  • The system
    • converts the question into an embedding
    • searches the vector DB
    • retrieves the relevant chunks
    • Injects them into LLM Prompt

5. Augmented Prompt

  • Instead of asking user: how many leaves can i take per year we ask
system: 
Answer ONLY using the provided contenxt

Context:
- Employees are entitled to 24 paid leaves per year
- Unused leaves can be carried forward up to 12 days

Question:
How many leaves can I take per year?
  • Now the LLM is
    • Grounded
    • Accurate
    • Auditable

6. Generation

  • LLM now
    • Reads the retrieved content
    • Generates a human-friendly answer
    • Does not invent facts (hallicunate)

By continuous learner

enthusiastic technology learner

Leave a Reply

Discover more from Direct AI Powered By Quality Thought

Subscribe now to keep reading and get access to the full archive.

Continue reading