Introduction to Fine-Tuning Large Language Models (LLM)
RAG vs FineTuning
- RAG:
- we dont touhc the model
- we give it extra context
- great for knowledge injection
- Fine-Tuning:
- We are changing the models behavior by training it on new examples
- Great for behavior/style/task specialization
- Example: Make the model always reply like a teacher or always output SQL …
Types of Fine-Tuning
- Full Fine-tuning:
- All model weights updated
- Expensive -> billions of params
- Rarely done outside labs
- Parameter Efficient Fine Tuning (PEFT)
- only train small extra modules (adapters)
- Base model stays frozen
- Examples:
- LoRA -> low rank adapters
- QLoRA -> LoRA with quantized base model
Instruction Tuning
- Most common fine-tuning format
- Dataset format:
<Instruction> <Input> <Output> - Examples
| Instruction | Input | Output |
| ———– | —– | —— |
| Translate to French | hello | bonjour |
| Explain Newtons first law | | objects tend to resist changes to their state of motion. |
Mini Dataset cretion
- We create JSONL dataset
{"instruction": "Translate 'good morning' to French", "input": "", "output": "Bonjour"}
{"instruction": "Summarize: Large Language Models learn from text patters", "input": "", "output": "LLMS learn from text and use them to generate answers"}
- Save as
train.jsonl
Libraries for finetuning
- we need gpu’s to finetune so one possible option is google colob
pip install -q "unsloth[train]" "trl" "datasets" "bitsandbytes" "accelerate" "peft" "transformers"
- unsloth -> optimized LLM loading/fine-tuning toolkit
- trl -> Hugging face’s trainer or instruction tuning (works with LoRA)
- datasets => load JSONL/CSV Datasets
- bitsandbytes => effecient quantization (4-bit, 8-bits)
- accelarte => effecient GPU handling
- peft
- transformers

