part 2 of the series on backpropagation, hyperparameters and evaluation metrics.
writing this gave me some basic understanding of how a LLM works, instead of thinking of a language model as a complete black box. let me know if it helps you too!
i'm figuring out how to train a LLM and documenting each step in this blog series.
for anyone who is also curious, here is part 1:
michaelhly.com/posts/train-...