agusti on Farcaster

agusti pfp

Transformers scale exceptionally well, but their core mechanism, self-attention, has a computational and memory cost that grows quadratically with the length of the sequence (O(n 2)). While we can push this to millions of tokens, it is an inefficient, brute-force method. It's like trying to listen to every person in a stadium to understand one conversation. Nature doesn't work this way; it uses targeted, efficient attention.

0 reply

1 recast

1 reaction