i dont even know if these numbers are good or bad rn

25M params model gpt2 base

well, the idea is to get the lose perentage close to 0

it is decreasing with every new iteration, that's good, regarding some point it will decrease the range of the step and will repeat similar numbers, then, you are done

if it is close to 0 you may have found optimal weights (or it maybe over trained, meaning the llm learnt over the dataset but will be bad on other stuff) 

if it is not optimal, you  may have found a local top and you need to modify some parameters and try again

(a veey short summary based on how it has been working some years ago, maybe have changed now)

Alchemist. 
Block Builder. 
Creator.
Degen. 
Ethereum dev. 
Farcaster Framer.

ty so much for this ✍️

much appreciated