Lets understand parameters
-
Refer Here for cpu vs gpu vs tpu
-
model: unsloth model naming conventions
unsloth/<base-model>-<size>-<variant>-<quantization>
- max_seq_length: This referes to maximum sequence length of tokens in training dataset.
- dtype: This refers to data type used to store weights
- dtype = None (Auto-detection): This is the recommended default setting. Unsloth automatically detects and uses the optimal dtype for your specific GPU hardware. It will default to torch.bfloat16 for newer Ampere+ GPUs (like A100, RTX 30xx, RTX 40xx, H100) due to its stability, and torch.float16 for older GPUs (like Tesla T4, V100).
- torch.bfloat16 (BF16): Recommended for Ampere+ GPUs. It provides more stable training and 0% accuracy degradation compared to standard QLoRA implementations. It is also more memory efficient than torch.float32.
- torch.float16 (FP16): Used for older GPUs that do not support bfloat16.
- torch.float32 (FP32): This is used for full fine-tuning or in some specific cases to ensure correct training behavior, but it requires significantly more VRAM.
- load_in_4bit: load the the weights in 4 bits. (Quanitization)
- rank
- lora_alpha
- target_modules
- attention: “q_proj”,”k_proj”, “v_proj”, “o_proj”
- feedforward: “up_proj”,”down_proj”,”gate_proj”
-
lora_dropout:
-
Watch classroom recording for purpose of these parameters.
