agusti
@bleu.eth
training your own model hits diff fr fr
2 replies
0 recast
11 reactions
agusti
@bleu.eth
i dont even know if these numbers are good or bad rn 25M params model gpt2 base
4 replies
0 recast
6 reactions
J. Valeska 🦊🎩🫂
@jvaleska.eth
well, the idea is to get the lose perentage close to 0 it is decreasing with every new iteration, that's good, regarding some point it will decrease the range of the step and will repeat similar numbers, then, you are done if it is close to 0 you may have found optimal weights (or it maybe over trained, meaning the llm learnt over the dataset but will be bad on other stuff) if it is not optimal, you may have found a local top and you need to modify some parameters and try again (a veey short summary based on how it has been working some years ago, maybe have changed now)
1 reply
1 recast
1 reaction
Nick T
@nt
Karpathy videos?
1 reply
0 recast
1 reaction
Stephan
@stephancill
Loss down good
0 reply
0 recast
1 reaction
shoni.eth
@alexpaden
What are you fine tuning on 25m param I thought would be pretty slow on a single gpu
0 reply
0 recast
1 reaction