@hylkedonker
State space models can be used as drop in replacements for attention, but with more favourable sequence length scaling. This video may well be the most lucid intro to state space models I've come across:
https://youtu.be/QJHA-PY8zDc?si=J5kGW87Yg0SAFdpR