i dont even know if these numbers are good or bad rn

25M params model gpt2 base

Alchemist. 
Block Builder. 
Creator.
Degen. 
Ethereum dev. 
Farcaster Framer.

ty so much for this ✍️

much appreciated

well, the idea is to get the lose perentage close to 0

it is decreasing with every new iteration, that's good, regarding some point it will decrease the range of the step and will repeat similar numbers, then, you are done

if it is close to 0 you may have found optimal weights (or it maybe over trained, meaning the llm learnt over the dataset but will be bad on other stuff) 

if it is not optimal, you  may have found a local top and you need to modify some parameters and try again

(a veey short summary based on how it has been working some years ago, maybe have changed now)

mostly reading and building. 
prev: @clankfun, improbable, github, amazon. https://ntik.me

They're very good if you want to understand what's going on under the hood

not really just vibe rawing it but i should check those

What are you fine tuning on 25m param I thought would be pretty slow on a single gpu