Token, Tokenizer, Embedding
- LLM’s dont understand Text.
- A sentence will be broken into multiple tokens, Program which does this is referred as tokenizer.

-
Tokenizer creates token ids

-
Each token will be converted into a vector (array of decimal points). A vector is larger dimension numerical value
-
The way embeddings works are
- It has vocabulary of tokens in a vector form with each token having a token id

- It has vocabulary of tokens in a vector form with each token having a token id
-
Points closer to each other in this larger dimensional space are considered similar
Parameters
-
In neural networks each neuron has an activation function which is calculate depending on weights and biases

-
In a LLM, parameters are based on weights and biases.
- More parameters increase the model capability to learn more patterns.
- Model effectiveness depends on training.
Next topic
- Transformer
- Experiment with RNN
Act as RNN, in simple terms show me how will you convert the following word into hindi "Kids are playing on a river bank. It is extremely breezy out there."
- Experiment with transformer
Now act as transformer with attention and solve the same problem
