Hylke pfp
Hylke

@hylkedonker

Interesting, so these models are actually overfitting. Seems counter to the fact that increasing llm capacity (larger model size) improves evaluation on downstream tasks.
0 reply
0 recast
0 reaction