@appliedml42
This is an amazing read. Cramming: Training a Language Model on a Single GPU in One Day
abs: https://arxiv.org/abs/2212.14034
🔥summary thread from Lucas
https://twitter.com/giffmana/status/1608568387583737856?s=61&t=qPnqsJlJqse2GDklDp4hww