Gen-AI Developer Classroom notes 26/Feb/2026

How are we going to fine tune

This is massively parallel processor designed to perform thousands of mathematical operations at the same time
Originally built for
- Rendering Video game graphics
- 3D Transformations
- Pixel Shading
- Texture mapping
Today used for
- Deep learning
- LLM Training
- Fine-Tuning
- Simulations
- Scientific computing
GPU will have thousads of smaller cores
We measure GPU Power in FLOPS (Floating point operations per second)
- 1 FLOP = 1 decimal math calculation
- 1 TFLOP = 1 trillion calculations per second
- 1 PFLOP = 1 quadrillion calcuations per second
What makes GPU Powerful
- CUDA Cores
- Tensor Cores
- VRAM (Video RAM – GPU Memory)
- Memory Bandwidth
- Interconnect (Multi-GPU)
CUDA: This is a parallel computing platform and programming model created by NVIDIA. It allows developers to use NVIDIA GPUs for general-purpose computing not just graphics
CUDA Cores are simple math processors desinged for floating point operations (Thousands exist inside modern GPUS)
A Tensor core is a specialize hardware unit inside modern NVIDIA GPUS designed specifically for accelare:
- Matrix multiplications operations used in deep learning
VRAM = Video Random access memory and this memory attached to the GPU and VRAMS is extreemly fast
When fine tuning, VRAM stores
- Model Weights
- Activations
- Gradients
- Optimizer states (Adam etc)
- Temporary buffers
TPU (Tensor Processing Unit): A TPU is custom AI accelarator chip designed by google built specifically to accelerate
- Tensor operations