Gen-AI Developer Classroom notes 28/Feb/2026

Fine tuning contd

  • For finetuning we will be using unsloth
  • unsloth has models which we will be using to tune
  • unsloth has decent documentation and notebooks shared.
  • unsloth can be used in two ways
    • on your vm (system) with GPU
    • Google Colab (GPUs for free)
  • Goal:
  • to digest steps

Dataset format

| Format | Use Case | Native Support |
| ——————- | ——————– | —————— |
| Alpaca | Instruction tuning | |
| ChatML | Chat models | |
| ShareGPT | Public chat data | |
| Plain Text | Completion tasks | Manual template |
| JSONL | Large-scale training | |
| Custom Prompt | Advanced control | |
| HuggingFace Dataset | Any dataset | |

  • Alpaca
{
  "instruction": "What is Kubernetes?",
  "input": "",
  "output": "Kubernetes is a container orchestration platform..."
}
  • JSONL
{"instruction":"Explain TCP","output":"TCP is a transport protocol"}
{"instruction":"Explain UDP","output":"UDP is a connectionless protocol"}

Hyperparameters

Steps

 

Yes

Not good enough

Yes

No

🚀 Start

1) Define Objective
Tone / domain / format
+ success metric

2) Choose Base Model
Llama / Qwen / Mistral / Phi
based on GPU + needs

3) Prepare Dataset
JSONL messages / instruction format
Clean · Dedupe · Balance categories

Dataset ready
train / val / test split

4) Setup Environment
PyTorch · transformers
bitsandbytes · trl · unsloth

Runtime ready
GPU available

5) Load Model + Tokenizer
via Unsloth
fp16 / bf16 or 4-bit

Model loaded
Memory-optimized patches applied

6) Apply LoRA / QLoRA
r · alpha · dropout · target_modules
optional gradient checkpointing

PEFT adapters attached

7) Configure Hyperparams
lr · batch · grad_accum
epochs · max_seq_len

Training plan ready

8) Train — SFT
Monitor loss · VRAM
speed · stability

Adapter weights checkpointed

9) Evaluate
Base vs Tuned model
Tone rubric · Regressions · Hallucinations

10 – Ready to Package?

Package Artifacts
Save adapters — recommended
OR merge into base model

Packaged artifact produced

Improve Dataset
Fix prompts / coverage gaps

Adjust LoRA Params
Tune r / alpha if needed

Re-train
with updated settings

11) Deploy
Ollama / vLLM / TGI / FastAPI
Smoke test critical flows

Endpoint live

12) Monitor
Latency · Cost · Drift
Tone violations · Feedback

Alerts + feedback dataset
for next iteration

Need another iteration?

✅ Done

By continuous learner

enthusiastic technology learner

Leave a Reply

Discover more from Direct AI Powered By Quality Thought

Subscribe now to keep reading and get access to the full archive.

Continue reading