Dutch machine learning enthousiast π€ with a love for programming.
1 Followers
Interesting, so these models are actually overfitting. Seems counter to the fact that increasing llm capacity (larger model size) improves evaluation on downstream tasks.
State space models can be used as drop in replacements for attention, but with more favourable sequence length scaling. This video may well be the most lucid intro to state space models I've come across: https://youtu.be/QJHA-PY8zDc?si=J5kGW87Yg0SAFdpR
Recurrent neural networks, are transformers, are state space models, are convolutions? Looks like we went full circle, back to 2012, when deep learning made it's first splash. https://arxiv.org/abs/2405.21060
Debugging neural nets is always a pain, but maybe penzai may bring some relief? https://github.com/google-deepmind/penzai