@ens445
llm.c training stabilized and achieves parity with PyTorch training (but faster) 💪
During the last few iters of training in my previous post, there were increases in loss. it was due to gradient norm clipping, and
@karpathy
fixed the bug🎉. With the latest llm.c code, GPT-2 (124M) achieved 35.3% ac