Gen-AI Developer Classroom notes 12/Feb/2026

Ensuring only updated docs are indexed

  • use the following code
directory_loader = DirectoryLoader(
    path="../data/updates/IT_Helpdesk_KB_Articles_v2",
    glob="*.txt",
    loader_cls=TextLoader,
)
documents = directory_loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20,
)
chunks = text_splitter.split_documents(documents)
embedding = VertexAIEmbeddings(
    model_name="text-embedding-005")
vector_store = Chroma(
    collection_name="kb_collection",
    embedding_function=embedding,
    persist_directory="../vectordb/kb_collection_db_sample1",
)
# only changed docs and reindex
result = index(
    docs_source=documents,
    record_manager=sql_record_manager,
    vector_store=vector_store,
    cleanup='incremental',
    source_id_key='source'
)

Project for RAG: – HR Helpdesk

  • The core idea of this project is we would be having HR documents (knowledge bases)
  • For the purposes of this project i would have this docs in folder, in your organization they might be on confluence, wiki pages
  • What we would be building:
    • Chatgpt kind of interface
    • User asks the question and RAG should respond
    • This is enterprise RAG, so it expected to be grounded
    • We should prove the precision, fairness of RAG before deploying
  • Deployment possibilities
    • Models:
      • Cloud:
        • AWS
        • Azure
        • GCP
      • On-prem:
        • ollama
    • RAG & Vector db
      • Deployment options
        • on-prem
        • cloud
  • Refer Here for the git repo.

Steps:

  1. synthetic data creation:
  2. understanding data
  3. indexing
  4. vector stores
  5. Prompts
  6. Retrieval
  7. Grounding
  8. Scores
  9. Deployment

By continuous learner

enthusiastic technology learner

Leave a Reply

Discover more from Direct AI Powered By Quality Thought

Subscribe now to keep reading and get access to the full archive.

Continue reading